chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
555 stars 88 forks source link

Assemble of DNA and RNA amplicons from PacBio run #379

Open s-t-calus opened 1 year ago

s-t-calus commented 1 year ago

Dear Mr/Mrs,

I am trying to combine data from two PacBio runs: DNA (10kb) + cDNA (10kb), these regions overlap each other partially and I would like to phase them based on short-tandem repeats and exonic SNP/fingerprint to genomic exon+intron SNP fingerprints to identify intronic SNP and a number of tandem repeats that are separated by >150kb from each other. cDNA was used to bring STR's + exonic SNP closer to each other.

Do you think hifiasm could assemble DNA (intronic and exonic) + RNA/cDNA reads (exonic) into a graph where we could: phase the number of STR's + exonic SNP (cDNA) with intronic and exonic fingerprints (DNA) based on 99-100% similarity?

If yes, which of the packages I should use, if not do you have any idea what software I should consider using?

Thank you very much and please let me know if you have any questions. S-T-C

chhylp123 commented 1 year ago

I guess phasing/assembly should be fine as hifiasm could assemble through some centromeres. But I have no idea if it could work on your sample since the data is quite different in comparison with the single-sample assembly. Probably you can have a try.

s-t-calus commented 1 year ago

Thank you very much for your quick response, I also considered using VSEARCH or UCLUST for OTU alike reads binning based on 99.5-100% similarity of centromeres. Does the approach you mentioned is similar to OTU binning?