How to make one non-redundant TE library for multiple genomes?

Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool

Other

189 stars 22 forks source link

What do you want to know? I currently have genomes of multiple close-related fish species. I want to create one non-redundant TE library for unified annotation of these genomes, like a "pangenome-lib". Should I directly merge the genome fastas of these species into a single file and process it with RepeatModeler, or is there another method I should use? Looking forward to your reply.

Helpful context

Is there a particular genome assembly or organism your question is about? If possible, please provide a link to a publicly available assembly and/or a species name. Nope.
Have you installed RepBase RepeatMasker Edition for RepeatMasker? This question is especially relevant for questions about classification or the RepeatClassifier program. Yes.

I recommend using a serial approach For instance run a de novo tool on first species, mask the second species with the library, de novo on the masked second species, combine the libraries, and iterate. This would avoid the need to employ a clustering method, certainly avoid biasing the seed alignments with identical copies, and provide insight into species specificity for each family.

Dfam-consortium / RepeatModeler

How to make one non-redundant TE library for multiple genomes? #211