VGP / vgp-assembly

VGP repository for the genome assembly working group
Other
185 stars 51 forks source link

Possible missed documentation -- haplotype merging step #73

Closed DustinSokolowski closed 2 years ago

DustinSokolowski commented 2 years ago

Hello,

Thank you for your fantastic pipeline and resources. I have been following many of the VGP (and mitoVGP)'s workflow to generate my own trio-binned assembly and our assembly wouldn't be nearly as high quality without you.

I am hoping to submit my genome in a similar manner to the Zebra Finch (https://www.ncbi.nlm.nih.gov/assembly/GCA_008822105.2), where there is a consensus genome from the haplotypes and separate submission of each parental genome. This being said, I can't seem to find how you completed the haplotype merging step within any of your papers/documentation. I've seen GATK and VCF-consensus as possible options but I would love to know what you did specifically. Similarly, in your final merged reference, did you ever use the two genomes to fill in left over gaps of one another?

Thanks again, Dustin

gf777 commented 2 years ago

Hi @DustinSokolowski

Glad to hear our resources were useful.

Yes for this both haplotypes are manually curated, the "best" haplotype is selected, and the sex chromosome from the other haplotype is added to the mix. Does it make sense?

No we never patch one assembly with the other. Especially in the case of trio this would introduce haplotype switches

DustinSokolowski commented 2 years ago

Hey!

I think so. Is this the best haplotype for each chromosome or the best haplotype overall? Also is manual curation based on a combination of length and mercury scores? For example, I think my "maternal" haplotype would be better and I could stick my Y chromosome onto it but it's possible that the paternal chr 1 is better.

I imagine we have to pick the best haplotype overall as if it so happens that there is (for example) 500Kb paternal of chromosome 2 on the end of maternal chromosme 1, then there could be substantial duplications at the ends of genes.

Thanks again for helping with the nitty-gritty details, it makes a world of difference.

Best, Dustin

arangrhie commented 2 years ago

Hello Dustin,

We rely on the curators for choosing the representative haplotype. This step rather relies on the structural integrity than base level accuracy, which could be checked with Hi-C and other supporting information.

As long as all structural errors are corrected, I don't think it would matter to contain chromosomes from different haplotypes to serve the purpose of a haploid representative reference - ex. chr1 from maternal and chr2 from paternal - however it's usually more error prone and difficult to track in the future once the haplotypes are mixed.

Best, Arang

DustinSokolowski commented 2 years ago

Hey Arang,

Thank you very much for the clarification. I now feel like I'm able to get started with confidence.

Best, Dustin