Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

How to make one non-redundant TE library for multiple genomes? #211

Closed GRGong closed 1 year ago

GRGong commented 1 year ago

What do you want to know? I currently have genomes of multiple close-related fish species. I want to create one non-redundant TE library for unified annotation of these genomes, like a "pangenome-lib". Should I directly merge the genome fastas of these species into a single file and process it with RepeatModeler, or is there another method I should use? Looking forward to your reply.

Helpful context

rmhubley commented 1 year ago

Hi, I answered this question recently on a slack channel ( perhaps you? ). Here is what I wrote:

I recommend using a serial approach  For instance run a de novo tool on first species, mask the
 second species with the library, de novo on the masked second species, combine the libraries, 
and iterate.  This would avoid the need to employ a clustering method, certainly avoid biasing 
the seed alignments with identical copies, and provide insight into species specificity for 
each family.