eggzilla / RNAlien

RNAlien - unsupervised RNA family model construction
http://rna.tbi.univie.ac.at/rnalien/
GNU General Public License v3.0
14 stars 2 forks source link

ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format! #9

Closed riasc closed 7 years ago

riasc commented 7 years ago

Hello,

I just tried to get RNAlien working with the first benchmark file (RF00001) with the following output:

RNAlien -i input.fa -o ~/RNAlien/1/0/ -d testoutput
2 sequences; length of alignment 118.
2 sequences; length of alignment 118.
2 sequences; length of alignment 118.
(-38.4,-38.9,0.9830508474576272)
(-37.85,-37.55,0.8813559322033899)
(-38.0,-37.4,0.8728813559322035)
ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format!
ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format!
ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format!
ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format!
ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format!

with the input file

at input.fa
>AB001721.1/2735-2851
CCCGGTGACTATAGAGAGAGGGCCACACCCGTTCCCATCCCGAACACGGAAGTTAAGCCT
CTCATCGCTGATGGTACTATGTGGTTCGCTGCATGGGAGAGTAGGACGTTGCCGGGT

However the output seems legit. Is this just a warning? Because conversion into the clustal format does not work at all.

Thanks.

eggzilla commented 7 years ago

Hey, thanks for using RNAlien and reporting the bug :-) The error message seems to be from one of the auxiliary tools RNAlien uses, because the message is not printed to the Log file. I suspect it is from RNAalifold (see ViennaRNA/src/bin/RNAalifold.c). I will investigate where this originates from and get back to you once I have the details.

eggzilla commented 7 years ago

Ok, I tried to reproduce the bug, but I dont get the messages with my setup. Could you please post the head of the Log file in the testoutput folder. Then I have all your tool versions and I can reproduce your setup.

riasc commented 7 years ago

Hi, thanks for the rapid reply. Here is the output of the Logfile:

Timestamp: 2017-01-19 11:24:58.521297 UTC
Temporary Directory: ./testoutput/
Clustalo version: 1.2.4
mlocarna version: LocARNA 1.9.0
RNAfold version: RNAfold 2.3.1
infernalversion: # INFERNAL 1.1.2 (July 2016)
No tax id provided - Sending find taxonomy start blast query
Initial TaxId: 766041
Modelconstruction iteration: 0
Input fasta:
AB001721.1/2735-2851
CCCGGTGACTATAGAGAGAGGGCCACACCCGTTCCCATCCCGAACACGGAAGTTAAGCCTCTCATCGCTGATGGTACTATGTGGTTCGCTGCATGGGAGAGTAGGACGTTGCCGGGT
[]Upper taxonomy limit: 766041
Taxonomic Context: not set
Evalue cutoff: 1.0e-3
Selected queries:
...
riasc commented 7 years ago

Since RNAalifold seems to be the culprit can you tell me the version you have used in the benchmarks? I can't find them anywhere.

eggzilla commented 7 years ago

Sure that was ViennaRNA Package 2.1.9 It is still available for download: http://www.tbi.univie.ac.at/RNA/#download Benchmark results, including Log files containing all used versions can be found here: http://rna.tbi.univie.ac.at/rnalien/help#benchmark

eggzilla commented 7 years ago

Ok, I have the same error messages using your setup. Moreover the alifold output is missing, meaning that the consensus secondary structure between the steps is not updated. I will try to release a fix on the weekend that covers this. Thanks again for reporting the bug and apologies for the inconvinience :-)

eggzilla commented 7 years ago

I just released RNAlien 1.3.0, which should fix the issue. The problem originates from the RNAalifold systemcall that was used. It seems that RNAalifold only accepts clustal input via STDIN and not stockholm any longer. However just providing the input file as parameter works:

[ef@posbi 3]$ RNAalifold -r --cfactor 0.6 --nfactor 0.5 < model.stockholm
ERROR: Your input file is missing sequences! Either your file is empty, or not in Clustal format!
[ef@posbi 3]$ RNAalifold -r --cfactor 0.6 --nfactor 0.5 model.stockholm
6 sequences; length of alignment 118.
UCCUGGUGACUAUAGAGAGAGGGCCACACCCGUUCCCAUCCCGAACACGGAAGUUAAGCCUCUCAUCGCCGAUGGUACUAUGGGGUUCGCUCCAUGGGAGAGUAGGACGUUGCCGGGU
.(((((((((....((((((((...((..((((((.......))..))))..))....)))))).))(((.......((((((((....)))))))).......)).)))))))))). (-53.86 =   -33.95 + -19.91)

I will have to introduce some upper version bounds on the RNAlien tool dependencies. Would be awesome if you could confirm that the issue has been resolved :-)

riasc commented 7 years ago

That was fast. Nice. Works for me. Thanks for the update.