chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
530 stars 87 forks source link

-s option and high heterozygosity #179

Open ptranvan opened 3 years ago

ptranvan commented 3 years ago

Hi,

My species is triploid and is highly hetrozygous. I used

hifiasm --primary --n-hap 3 -t 24 -o out.asm .*.fastq.gz

But the assembly size of my primary contings is way higher (240Mbp) than the genomescope estimation (140Mbp).

http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=nnC4CPmgLE3605rbyM7y

I saw on the doc that the -s option could be adjusted. Do you have any recommendation of the value I can set ?

And/Or do you have recommendation about other options ?

Thanks !

chhylp123 commented 3 years ago

Could you please also set --n-hap 3? The default purging step has a diploid assumption.

ptranvan commented 3 years ago

Yes I did set --n-hap 3 . Look at my command :)

chhylp123 commented 3 years ago

Sorry for that. In this case probably you should try purge_dups. I guess you should run multiple rounds of purge_dups for triploid samples. Hifiasm just does one round of purging so that it may not be able to get primary assembly properly.

ptranvan commented 3 years ago

Thanks I will take a look. What about the option -s ? is it useless for triploid ?

chhylp123 commented 3 years ago

It is the similarity threshold to find overlaps between different haplotypes. Usually it is ok with the default -s 0.55. If the heterozygosity rate is too high, you can set smaller value for it.

kevfengler227 commented 3 years ago

So will setting --n-hap 3 produce a three haplotype assembly? I was just about to post a question about tetraploid assembly so I want to try --n-hap4 with hic.

Thanks, KF

chhylp123 commented 3 years ago

Not able to work for polyploid samples right now. Set 3 or 4 for --n-hap is just used to disable diploid assumption during graph clean.

kevfengler227 commented 3 years ago

OK, thanks. Polyploids are definitely the next challenge to overcome. I'll look forward to this capability in hifiasm as I have a lot of polyploids to do!

KF

chhylp123 commented 3 years ago

Yeah, polyploids are interesting but we don't have polyploidy data for testing and debugging...

BjoernUsadel commented 3 years ago

Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?

chhylp123 commented 3 years ago

Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?

Thanks for the help! For us, it would be good to get HiFi, Hi-C, and one type of ground truth. We need ground truth to have a sense for polyploid samples.

zhaotao1987 commented 2 years ago

Thanks for the information here. I'm working on an AAB-type triploid genome. I'd like to have the haplotypes phased. So currently, what would be the best practice using hifiasm for a triploid species? How about this:

  1. Use --n-haps 3, which would help graph cleaning, so high quality p_utgs.
  2. Use -l0, no purging at all, just to keep all unitigs.
  3. Use extracted unitigs and HiC data for scaffolding using external HiC programs.
  4. Take information in assemble graph and (reads) to fill the gaps as much as possible of the assembly from last step. What do you think of this and what else might be helpful? Thanks a lot, I would like to hear your opinion.

Best, Tao btw, could you help to have a quick look at my running log and assembly graph (p_utg) to see if something is very wrong.. My genome size is around 2G (3 haplotypes in total). Thanks! run_log.txt assemble_graph