chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
515 stars 85 forks source link

Q: trio-binning with parental Hifi data #282

Open ptrebert opened 2 years ago

ptrebert commented 2 years ago

Hi, I have a question regarding using parental HiFi instead of Illumina data for a trio-phased assembly (all experiments done with recent versions of hifiasm and yak). When doing this, we observed fairly low (3-5 Mbp) contig N50 for the child/trio-phased assembly. Assembling trio members just as primary/alt resulted in expected contig N50 between 30 to 50 Mbp. I have seen the FAQ entries about differences in contiguity between primary and phased, and about tweaking the parameters -D and -N to potentially improve contiguity. But before we explore that option, I just wanted to get your input if using parental HiFi may cause problems that we have overlooked. Thanks for your help.

+Peter

chhylp123 commented 2 years ago

We haven't tried to run hifiasm with parental HiFi, so I have no idea about that. -D and -N would not help for the trio-binning assemblies. Is it possible that you can share the bin files with us?

ptrebert commented 2 years ago

thanks a lot for the fast reply. seems I misunderstood the FAQ entry about -D and -N, then.

Re data sharing: unfortunately, no, I do not have the permission for that. But we are trying the same approach with a public dataset, let's see what we find there...

ptrebert commented 2 years ago

Sorry that took a bit of time, but we have repeated the experiment with the public PUR trio (HG00733 + parents); what do you think of these N50s:

                       N50 (Mb)
child-trio             12.9 hap1 | 17.8 hap2
child-noTrio           68.3
mother                 57.7
father                 59.2 

I think HPRC reports something like ~40 Mbp hap contig N50 for (Illumina) trio-binned assemblies

chhylp123 commented 2 years ago

It is not such good. Could you please share the bin files with us?

ptrebert commented 2 years ago

@hugocarmaga can you please make all the bin files available via Globus (folder: see internal slack) that are part of the above experiment? Thanks

And please report here when all files are copied...

hugocarmaga commented 2 years ago

All the relevant files are copied there for the four assemblies mentioned above.

lh3 commented 2 years ago

I did trio binning assembly for the WashU trio. It worked fine. I wonder what is the issue with HG00733...

ptrebert commented 2 years ago

are the data for this trio (which one is that exactly?) public? we could try repeating the experiment to check our setup

lh3 commented 2 years ago

https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/9f0e43e9-a57d-42c1-992c-a2ce7c20940f--WUSTL_BLOOD_HIFI/

This is from a pedigree. I forgot how samples are related. You may ask Karen.

ptrebert commented 2 years ago

Thanks

yfukasawa commented 2 years ago

Hi @lh3,

Sorry for cutting in, the topic is interesting.

I did trio binning assembly for the WashU trio.

The same k-mer and bloom filter size (-k 31 -b 37 ) were applied? I assume this would be the case, but if not, may I ask parameters applied for in the case (i.e., trio-HiFi case)?

lh3 commented 2 years ago

Yes, same setting for short reads.

yfukasawa commented 2 years ago

Thanks

ptrebert commented 1 year ago

to keep this alive: we also did the trio-binning using the WashU trio, and the results look ok (@hugocarmaga can you add the N50s here, please?). At least for our initial dataset, the preliminary conclusion is that this is probably a problem with the data. Of course, would be nice to know if there are any insights about the sub-optimal results for the PUR trio.

chhylp123 commented 1 year ago

Thanks for letting us know. We could go back to these samples and try to figure out the problems in October. This month is quite busy. I will let you know once I get new results.

hugocarmaga commented 1 year ago

Here are the N50's for the washU trio (sorry it took so long):

                N50 (Mb)
child-trio      33.4 hap1 | 27.6 hap2
child-noTrio    96.7
mother          78.9
father          68.7 
MagdalenaZZ commented 5 months ago

Hi, I also want to run hifiasm with parental HiFi - hey @chhylp123 any update on if/when it is possible? Or maybe it is out-of-scope for hifiasm? Many thanks for great software!

chhylp123 commented 5 months ago

Do you have both parental and child reads?