Open ptranvan opened 3 years ago
Could you please also set --n-hap 3
? The default purging step has a diploid assumption.
Yes I did set --n-hap 3
. Look at my command :)
Sorry for that. In this case probably you should try purge_dups. I guess you should run multiple rounds of purge_dups for triploid samples. Hifiasm just does one round of purging so that it may not be able to get primary assembly properly.
Thanks I will take a look.
What about the option -s
? is it useless for triploid ?
It is the similarity threshold to find overlaps between different haplotypes. Usually it is ok with the default -s 0.55
. If the heterozygosity rate is too high, you can set smaller value for it.
So will setting --n-hap 3 produce a three haplotype assembly? I was just about to post a question about tetraploid assembly so I want to try --n-hap4 with hic.
Thanks, KF
Not able to work for polyploid samples right now. Set 3 or 4 for --n-hap
is just used to disable diploid assumption during graph clean.
OK, thanks. Polyploids are definitely the next challenge to overcome. I'll look forward to this capability in hifiasm as I have a lot of polyploids to do!
KF
Yeah, polyploids are interesting but we don't have polyploidy data for testing and debugging...
Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?
Polyploids would definitely be "the feature": What would you need? Would ccs data be enough?
Thanks for the help! For us, it would be good to get HiFi, Hi-C, and one type of ground truth. We need ground truth to have a sense for polyploid samples.
Thanks for the information here. I'm working on an AAB-type triploid genome. I'd like to have the haplotypes phased. So currently, what would be the best practice using hifiasm for a triploid species? How about this:
Best, Tao btw, could you help to have a quick look at my running log and assembly graph (p_utg) to see if something is very wrong.. My genome size is around 2G (3 haplotypes in total). Thanks! run_log.txt
Hi,
My species is triploid and is highly hetrozygous. I used
hifiasm --primary --n-hap 3 -t 24 -o out.asm .*.fastq.gz
But the assembly size of my primary contings is way higher (240Mbp) than the genomescope estimation (140Mbp).
http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=nnC4CPmgLE3605rbyM7y
I saw on the doc that the
-s
option could be adjusted. Do you have any recommendation of the value I can set ?And/Or do you have recommendation about other options ?
Thanks !