chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
529 stars 86 forks source link

How do you assemble chromosomes X and Y? #625

Open zuodabin opened 6 months ago

zuodabin commented 6 months ago

Dear author, how can you assemble X and Y chromosomes? What parameters and data need to be added

DustinSokolowski commented 6 months ago

Hey,

Not an author but pretty familiar with genome assembly and annotation. There shouldn't be extra parameters to assemble X. X chromosome contigs/scaffolds will be represented by some scaffold number in the maternal haplotype just like any other chromosome. Sex chromosomes undergo much less chromosome re-arrangements so X should be relatively easy to identify. The easiest options are: 1) If there is a relatively closely related species with a well assembled X (for example you're assembling a rodent you can use mouse), you can align your scaffolds to their assembly and pull out what matches to their X. Since the X chromosome signal is typically robust, I haven't ever needed to tune parameters or test specific tools for this. Honestly minimap2 + DGenies to visualize the genome-wide dot plots typically does the trick for me. 2) Annotate the genes on your scaffolds, the X chromosome scaffolds should have a huge spike of X chromosome genes (e.g., 85%+ of genes originate from the X chromosome in other species). Again, super robust signal so most popular assembly annotation tools work very well. Personally, I found TOGA to do a good job of assigning gene symbols and identifying transcripts so you'll get a nice two-for-one for genome annotation, but again, I've identified X chromosomes wiith TOGA, GeMoMa, embl annotations, Biser2 etc. etc.

The Y chromosome on the other hand is notoriously tricky to assemble due to it's repetitive and heterochromatic nature, not to mention the regions that look like the X chromosome. Again, no special parameters needed, but folks who want a complete Y chromosome assembly typically pull down the Y chromosome prior to sequencing. In theory you should be able to find some Y chromosome fragments in the paternal haplotype but working with those fragments can be tricky.

zuodabin commented 6 months ago

Thank you very much for your answer, but I am curious about how you can assemble X and Y chromosomes like humans. I currently have HiFI and HIC data发自我的 iPhone在 2024年3月18日,23:02,Dustin Sokolowski @.***> 写道: Hey, Not an author but pretty familiar with genome assembly and annotation. There shouldn't be extra parameters to assemble X. X chromosome contigs/scaffolds will be represented by some scaffold number in the maternal haplotype just like any other chromosome. Sex chromosomes undergo much less chromosome re-arrangements so X should be relatively easy to identify. The easiest options are:

If there is a relatively closely related species with a well assembled X (for example you're assembling a rodent you can use mouse), you can align your scaffolds to their assembly and pull out what matches to their X. Since the X chromosome signal is typically robust, I haven't ever needed to tune parameters or test specific tools for this. Honestly minimap2 + DGenies to visualize the genome-wide dot plots typically does the trick for me. Annotate the genes on your scaffolds, the X chromosome scaffolds should have a huge spike of X chromosome genes (e.g., 85%+ of genes originate from the X chromosome in other species). Again, super robust signal so most popular assembly annotation tools work very well. Personally, I found TOGA to do a good job of assigning gene symbols and identifying transcripts so you'll get a nice two-for-one for genome annotation, but again, I've identified X chromosomes wiith TOGA, GeMoMa, embl annotations, Biser2 etc. etc.

The Y chromosome on the other hand is notoriously tricky to assemble due to it's repetitive and heterochromatic nature, not to mention the regions that look like the X chromosome. Again, no special parameters needed, but folks who want a complete Y chromosome assembly typically pull down the Y chromosome prior to sequencing. In theory you should be able to find some Y chromosome fragments in the paternal haplotype but working with those fragments can be tricky.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

chhylp123 commented 6 months ago

I agree with @DustinSokolowski. Generally there is no specific parameter for chrX and chrY since hifiasm does not have assumption when doing the de novo genome assembly. ChrX and chrY might be fragmented in contig-level, but it should be easy to get scaffold-level chromosomes for chrX and chrY.