Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

Consensi.fa.classified #103

Closed niWdooG closed 1 year ago

niWdooG commented 4 years ago

Hello,

I use Repeatodeler v.2.0.1 and RepeatMasker v.4.1.0 (both installed using Conda) for a non-model grass plant. Everything goes good till the last step with RepeatClassifier. I got a similar issue to #9988:

Missing ../share/RepeatMasker/Libraries/RepeatMasker.lib.nsq! Please rerun the configure program in the RepeatModeler directory before running this script.

Due to I don't have access to RepBase, I used the TREP database (http://botserv2.uzh.ch/kelldata/trep-db/downloadFiles.html): makeblastdb -in trep-db_complete_Rel-19.fasta -dbtype nucl

In total, Repeatodeler found 2713 repeats. However, only 460 of those were classified by RepeatClassifier. Maybe it's a dummy question, but is there a way to classify the rest of repeats? In families-classified.stk I see that these repeats can be assigned to the Retrotransposon class (Interspersed_Repeat;Unknown), however consensi.fa.classified keep them as #Unknown.

Thanks in advance

mnshgl0110 commented 3 years ago

I am facing a similar issue where most of the repeats are classified as Unknown. @niWdooG Were you able to resolve this issue? If so, then it would be great if you share the solution.

niWdooG commented 3 years ago

Hello. Unfortunately, I did not find a solution.

jebrosen commented 3 years ago

Hello. Unfortunately, I did not find a solution.

Sorry to hear that, @niWdooG! I assumed you had solved it when you closed the issue.

I use Repeatodeler v.2.0.1 and RepeatMasker v.4.1.0 (both installed using Conda) for a non-model grass plant. Everything goes good till the last step with RepeatClassifier. I got a similar issue to #9988:

If it is possible for you, I would use RepeatClassifier from a manual installation. There are various issues with the bioconda packages of RepeatModeler and I am not sure that they are all fixed yet.

Due to I don't have access to RepBase, I used the TREP database (http://botserv2.uzh.ch/kelldata/trep-db/downloadFiles.html): makeblastdb -in trep-db_complete_Rel-19.fasta -dbtype nucl

How did you configure RepeatMasker and/or RepeatModeler to use this library file?

In families-classified.stk I see that these repeats can be assigned to the Retrotransposon class (Interspersed_Repeat;Unknown), however consensi.fa.classified keep them as #Unknown.

Interspersed_Repeat is not the same as Retrotransposon. If any of our websites or documentation imply this, please let us know so we can correct it! "Interspersed Repeats" includes many types of repetitive DNA, including retrotransposons, DNA transposons, tandem repeats and satellites, and more.

niWdooG commented 3 years ago

Hello Jeb,

it was a while ago, as far as I remember I tried to modify the TREP database, e.g.:

DHH_Mpol_A_RND-1 Metrosideros polymorpha; DNA-transposon, Helitron, Helitron; fragment; KEY=3609; to DHH_Mpol_B_RND-1#DNA-transposon/Helitron/Helitron ( Metrosideros polymorpha, fragment, KEY=3610 )

regarding interspersed repeats, it is definitely my bad. In future, I will try to use a manual installation of RepeatClassifier as you suggested.