TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
139 stars 20 forks source link

unclassified TE #133

Closed PengfeiInTuebingen closed 2 months ago

PengfeiInTuebingen commented 2 months ago

Hi, Very cool tool! I successfuly run earlgrey with my genome, it's a brown algae genome, i got the summary files, i found there is a big proportion of unclassified TE in my genome, attached is the pie plot of the summary file, i would like to know what are those unclassified TEs, do you have any suggestion for this? Thanks in advance! image

Pf

TobyBaril commented 2 months ago

Hi, As with any non-model/characterised genome, a large proportion of TEs will be unclassified, meaning potential repetitive elements detected in the de novo stage do not share sufficient homology to known TEs in the curated section of Dfam. To classify these will require some level of manual curation or further inspection. This can be done following the protocols in this paper: https://link.springer.com/article/10.1186/s13100-021-00259-7 with the .strained TE library generated by Earl Grey. This can be accelerated using MCHelper (https://www.biorxiv.org/content/10.1101/2023.10.17.562682v2). Essentially, we do not know enough about the diversity of TEs across eukaryotes, and so beyond model species curated elements are missing, making it difficult to classify them based on homology to curated elements (which are from a very small subset of eukaryotes for the moment). The only way this is going to improve is if the scientific community curates elements and submits them to the databases in Dfam, which will improve over time as more TE libraries are generated for diverse species.