Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
230 stars 50 forks source link

HMM file format error when using custom library in FASTA format #217

Closed aminakur closed 1 month ago

aminakur commented 1 year ago

I am trying to run RepeatMasker with a custom library downloaded from a database. It's in fasta format:

>ORSgTETNOOT01930 gi|14578149|nt85894-86033 unclassified transposon
GGCTGCGTTTAGATCCAAAGTTTGGATCCAAACTTCAGTCCTTTTCCATCACATCAACCT
GTCATACACATAAAACTTTTCAGTCACATCATCTTTAATTTCAACCAAAATCCAAACTTT
GCGCTGAACTAAACACAGAC
>ORSgCMCM00201320 gi|28460675|nt108249-108396 putative centromere sequence, CentO/CentC-like
ATATTAGCCCACACGGGTGCGATGTTTTTGACCAGAATGAAAATGTTCAAAAAACACCAA
AGCATGATTTTTGGACTTATTGGAGTGTATTGGGTGCGTTCGTGGCAAATACTCAATTCA
TGATTCGCGCGGCGAACTTTTGTCAATT

My code: RepeatMasker -xsmall -pa 12 -lib TIGR_Oryza_Repeats_v_3_3.fna azucena.fna I am getting an error:

RepeatMasker version 4.1.2-p1
Search Engine: HMMER [ 3.3.2 (Nov 2020) ]
RepeatMasker::createLib(): Error invoking /share/apps/hmmer/3.3.2/intel/bin/hmmpress on file /scratch/ak8725/genomes/RM_3160160.FriMay261649112023/TIGR_Oryza_Repeats_v_3_3.fna.

An additional hmmPress.log file is created:

Error: File format problem in trying to open HMM file /scratch/ak8725/genomes/RM_3145390.FriMay261638102023/TIGR_Oryza_Repeats_v_3_3.fna.
Format tag is '>ORSgTETNOOT01930': unrecognized.
Current H3 format is 'HMMER3/f'. Previous H2/H3 formats also supported.

I don't understand what I did wrong. The documentation says that -lib option expects a fasta formatted library, which I provided.

rmhubley commented 1 year ago

The problem is that your RepeatMasker installation was configured to use nhmmer as the default search engine but you are trying to search using consensus library instead of a profile Hidden Markov Model library. You can use the "-engine" option to change the search engine to one that will work with consensus sequences ( e.g "-engine crossmatch" or "-engine ncbi" if you have installed phrap/crossmatch or rmblast respecitvely ).