Nextomics / NextPolish

Fast and accurately polish the genome generated by long reads.
GNU General Public License v3.0
213 stars 28 forks source link

Using Illumina Hi-C and Chicago data? #104

Open 000generic opened 1 year ago

000generic commented 1 year ago

I have PACBIO SMRT Sequel data - and both Hi-C and Chicago Illumina data - but no standard PE or matepair Illumina data. Would you recommend using NextPolish with just the PacBio data - or would it be ok to also use the Illumina HiSeq X Ten Hi-C and/or Chicago data. I was thinking the upstream Hi-C / Chicago context of the Illumina PE reads won't matter as long, as a given read aligns nicely - it could then be used for error correction. But I'm not sure if there are additional considerations that I'm overlooking and would cause Hi-C and/or Chicago Illumina data to negatively impact NextPolish.

Any guidance or advice on this would be greatly appreciated!

Thank you :)

moold commented 1 year ago

If the assembly was assmblied using HiFi data, I don't recommend to do like this. But, for clr data, I did not test polish an assembly with Hi-C / Chicago data, but I think you could give it a try anyway, this should yield some benefits for the assembly.

000generic commented 1 year ago

Thank you for the quick reply and helpful guidance! It is clr data - so I think I will give it a try.

As a followup question - what kind of metric would you recommend to assess the sequences after NextPolish to see if they have improved, remained the same, or possibly degraded?

Thanks again :)

moold commented 1 year ago

I think you can try merqury or mapping the short reads to the assembly and count homozygous snps and indels.

000generic commented 1 year ago

Great - I'll give merqury a try. thank you!