Closed mcsimenc closed 7 years ago
Do you have EMBOSS 6+ installed on your system? 'transeq' executable is required (Translate nucleic acid sequences) while mgescan program runs.
BTW, it seems you are running a previous version of mgescan, 1.3.1 or order. Latest version which is the current git repository provides better handling errors and files. We might improve code readability in terms of debugging when we have user feedback though.
Let me know if you have any further questions.
Thank you.
Yes, we have EMBOSS 6.5.7. You're right, MGEScan 1.3.1. I downloaded it from source forge. What is the version here on github? I don't see it listed. I'll downloaded the version in this git repository and try it. Thanks!
Code management is diverged from sourceforge.net and 3.0.0 is currently provided at github.com. Note that HMMER 3+ and TRF are also required. Documentation can be found here: http://mgescan.readthedocs.io/en/latest/installation.html. You can skip galaxy installation if you use command line only.
I had a chance to look at the 1.3.1 version and the error message you saw is generated by hmmsearch because bbbbb
file is missing. For more detail, transeq -frame=f $seq_file -outseq=$pep_file
is ran by hmm/get_phmm.pl perl file under MGEScan_nonLTR_v2 directory and I'm guessing the bbbbb
temporary file was not created for some reason when you ran mgescan with your input sequences. Probably I need a sample input to replicate the errors with debugging.
Yep TRF and HMMER3 are available in my environment. I would like to use MGEScan on the command line only. I downloaded and installed MGEScan 3.0.0, but there is an issue.
The call
mgescan nonltr Sacu_asm_separated_scafs/ --output=output 1>mgescan.out 2>mgescan.err
results in this on stderr
Error: Sequence file /tmp/2KC2GEDSAz.bbbbb is empty or misformatted
Help, Lee!
Can you provide a sample of your input sequences? I just wanted to run it for debugging purpose. In the meantime, I'll try to see what others can cause the errors.
Without Sacu_v1.1_asm_Sacu_v1.1_s0001.fasta
file, mgescan ran successfully with nonltr.gff3
result:
asm MGEScan_nonLTR mobile_genetic_element 2929787 2934653 . . . ID=Sacu_v1.1_asm_Sacu_v1.1_s0002.fa_2929787 ...
It looks like Sacu_v1.1_asm_Sacu_v1.1_s0001.fasta
file is corrupted or in binary format when I checked the content.
Oh that's weird! I used the splitMultiFasta.py script from MGEScan 1.3.1 to split a multi fasta to generate those files. I'll try split.py from MGEScan 3.0.0. I'm guessing it's for the same purpose.
I ran mgescan with Sacu_v1.1_asm_Sacu_v1.1_s0002.fasta only and I got the same errors. e.g.
Error: Sequence file /tmp/2KC2GEDSAz.bbbbb is empty or misformatted
Maybe this problem is one with permissions on my system? What generates the *.bbbb files? Thanks for all your help.
Btw I reran the splitMultiFasta.py and it didn't produce any corrupt/binary files. Not sure what happened there.
*.bbbbb is a protein sequence translation file generated by transeq
while mgescan nonltr command runs to identify elements among 12 clades. This file (*.bbbbb) is ephemeral on the /tmp directory therefore we assume you have a write permission on the temp directory.
I discovered that at least some of the *.bbbbb are appearing in /tmp during the run but MGEScan still reports them as empty or misformatted. I found the transeq call in mgescan/src/mgescan/nonltr/hmm/get_phmm.pl
and got rid of the 2>/dev/null
and think I found the problem: I need libpq.so.5
which looks like a PostgreSQL library. This cluster is running PostgreSQL 8.4.18, maybe I need to upgrade or reinstall?
Thank you for the help Hyungro!
Here is the error thrown by transeq:
/share/apps/genomics/EMBOSS-6.5.7/emboss/.libs/lt-transeq: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory
Good finding! I am afraid that I don't have a solution for the library issues but when I checked my system, libpq.so.5 is linked like below. I guess you can reinstall (or upgrade) as you mentioned. Or you can try to create a symbolic link as a workaround if you have the libpq.so.5
file in your system:
$ ldd `which transeq`
...
libpq.so.5 => /usr/lib/x86_64-linux-gnu/libpq.so.5 (0x00007fecf7ffd000)
...
Regarding to the libpq.so.5
error, I found a suggestion from stackoverflow.com here: http://stackoverflow.com/questions/12781566/error-while-loading-shared-libraries-libpq-so-5-cannot-open-shared-object-file
Yay everything seems to be working! transeq was expecting to find libpq.so.5
in EMBOSS-6.6.0/lib/
I also updated EMBOSS.
When I run MGE_Scan_nonLTR_v2 on linux command line using this call:
run_MGEScan.pl -genome=scafs/ -data=output/ -hmmerv=3 -program=N 2>mgescan.stderr 1>mgescan.stdout
it gives many lines of this error:
Error: Failed to open sequence file output/b/out1/bbbbb for reading
It finishes running and produces an empty nonltr.gff3 file.
Other files named "aaaaa" and "ppppp" are in the output/b/out1/bbbbb directory while the program is running. Any ideas what's wrong? There's no indication of where in the code this error is coming from. A grep for "Failed to open sequence" on all the MGEScan scripts produced no matching lines.
Thanks for your help, Matt