cgroza / GraffiTE

GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Other
121 stars 6 forks source link

Could GraffiTE identify de novo TE insertions using short reads? #42

Closed LeoHongboWANG closed 1 month ago

LeoHongboWANG commented 1 month ago

Dear GraffiTE team,

Thank you for developing this outstanding tool!

I wonder if GraffiTE can be extended or adapted to discover new de novo TE insertions using short read datasets, especially in family-based studies. Based on my understanding, Pangenie is used for genotyping polymorphisms already present in the pangenomic graph, but it may not be able to identify novel insertions outside the graph?

Do you have any suggestions or recommendations on approaching this, or are there plans to incorporate such functionality in future releases?

Thank you, Hongbo

clemgoub commented 1 month ago

Hello @LeoHongboWANG,

Indeed, your description is correct. You can only genotype with short-reads TE that are present in the graph. Some short-read methods can attempt local assembly of the TE insertions; in my experience, this can remain challenging for elements > 1kb. We've done it once here in human, were Alu elements are ~300bp.

We don't have direct plans to incorporate such functionality in GraffiTE (we currently can only work part-time on the pipeline and are focusing on maintenance and pangenome tooling), however, if you find a way to produce a VCF file with both reference and alternative sequence resolved using your short-reads, then you can use this VCF with GraffiTE to identify which SV contain TE and graph-genotype them.

There are plenty of short-reads methods that can give you informative results about allele frequencies, but of course, most will not report the insertion sequence. You can take a look here (filter the keyword column with "polymorphism") for a non-exhaustive list of such methods.

Let me know if you have further questions!

Cheers,

Clément