JBerthelier / PiRATE

PiRATE (Pipeline to Retrieve and Annotate Transposable Elements)
http://doi.org/10.17882/51795
18 stars 5 forks source link

RepeatMasker encountered a line in an unrecognized format. #44

Open vtrinca opened 4 years ago

vtrinca commented 4 years ago

Hello,

During the past couple weeks I tried to run several Detection steps with the PiRATE pipeline and all of them reported the same error. It is about invalid characters in the fasta file. Below is the error reported by the RepeatMasker.

Fatal error: Exit code 1 () FastaDB::_cleanIndexAndCompact - WARNING: RepeatMasker encountered a line in an unrecognized format. The offending subsequence is "GTTGG)N;GPM". The offending line is "TGTACGGGTGTAATGATGTAATTGCTTTTGTATGTTGG)N;GPMXK6I,3". Seq

The long report displays a lot of errors like this. In every line there are different characters not recognized, however the fasta file does not have these. I've searched for the lines with this error and also the specific characters, in both files in the VM and my PC, and found nothing. I think these files may be corrupted after input in the pipeline.

Is there something wrong I am doing with file management or a problem in my .ova file? PS. I transfer the fasta file with FileZilla.

Thank you!

JBerthelier commented 4 years ago

Dear vtrinca,

Sorry for my delay,

This is the first time I see this warning. I think there is probably nothing wrong regarding your management of the file,
transfering files with FileZilla is the best way.

I do not think that these file are corrupted after the input, it never happen to me, and nobody reported me this error before.

I have made a quick search on google and it seems that this error, could appear because of some specific character that are in your input data fasta file. This was also my first feeling.

http://seqanswers.com/forums/showthread.php?t=27239

Do you use a genome assembly fasta file ?

Best

Jeremy