MGEScan / mgescan

A Galaxy based system for identifying retrotransposons in genome
http://mgescan.github.io/mgescan/
GNU General Public License v3.0
5 stars 5 forks source link

MGEScan_nonLTR_v2 #14

Closed mcsimenc closed 7 years ago

mcsimenc commented 7 years ago

When I run MGE_Scan_nonLTR_v2 on linux command line using this call:

run_MGEScan.pl -genome=scafs/ -data=output/ -hmmerv=3 -program=N 2>mgescan.stderr 1>mgescan.stdout

it gives many lines of this error:

Error: Failed to open sequence file output/b/out1/bbbbb for reading

It finishes running and produces an empty nonltr.gff3 file.

Other files named "aaaaa" and "ppppp" are in the output/b/out1/bbbbb directory while the program is running. Any ideas what's wrong? There's no indication of where in the code this error is coming from. A grep for "Failed to open sequence" on all the MGEScan scripts produced no matching lines.

Thanks for your help, Matt

lee212 commented 7 years ago

Do you have EMBOSS 6+ installed on your system? 'transeq' executable is required (Translate nucleic acid sequences) while mgescan program runs.

BTW, it seems you are running a previous version of mgescan, 1.3.1 or order. Latest version which is the current git repository provides better handling errors and files. We might improve code readability in terms of debugging when we have user feedback though.

Let me know if you have any further questions.

Thank you.

mcsimenc commented 7 years ago

Yes, we have EMBOSS 6.5.7. You're right, MGEScan 1.3.1. I downloaded it from source forge. What is the version here on github? I don't see it listed. I'll downloaded the version in this git repository and try it. Thanks!

lee212 commented 7 years ago

Code management is diverged from sourceforge.net and 3.0.0 is currently provided at github.com. Note that HMMER 3+ and TRF are also required. Documentation can be found here: http://mgescan.readthedocs.io/en/latest/installation.html. You can skip galaxy installation if you use command line only.

I had a chance to look at the 1.3.1 version and the error message you saw is generated by hmmsearch because bbbbb file is missing. For more detail, transeq -frame=f $seq_file -outseq=$pep_file is ran by hmm/get_phmm.pl perl file under MGEScan_nonLTR_v2 directory and I'm guessing the bbbbb temporary file was not created for some reason when you ran mgescan with your input sequences. Probably I need a sample input to replicate the errors with debugging.

mcsimenc commented 7 years ago

Yep TRF and HMMER3 are available in my environment. I would like to use MGEScan on the command line only. I downloaded and installed MGEScan 3.0.0, but there is an issue.

The call mgescan nonltr Sacu_asm_separated_scafs/ --output=output 1>mgescan.out 2>mgescan.err

results in this on stderr Error: Sequence file /tmp/2KC2GEDSAz.bbbbb is empty or misformatted

Help, Lee!

lee212 commented 7 years ago

Can you provide a sample of your input sequences? I just wanted to run it for debugging purpose. In the meantime, I'll try to see what others can cause the errors.

mcsimenc commented 7 years ago

test_seqs.tar.gz

lee212 commented 7 years ago

Without Sacu_v1.1_asm_Sacu_v1.1_s0001.fasta file, mgescan ran successfully with nonltr.gff3 result:

asm MGEScan_nonLTR mobile_genetic_element 2929787 2934653 . . . ID=Sacu_v1.1_asm_Sacu_v1.1_s0002.fa_2929787 ...

It looks like Sacu_v1.1_asm_Sacu_v1.1_s0001.fasta file is corrupted or in binary format when I checked the content.

mcsimenc commented 7 years ago

Oh that's weird! I used the splitMultiFasta.py script from MGEScan 1.3.1 to split a multi fasta to generate those files. I'll try split.py from MGEScan 3.0.0. I'm guessing it's for the same purpose.

mcsimenc commented 7 years ago

I ran mgescan with Sacu_v1.1_asm_Sacu_v1.1_s0002.fasta only and I got the same errors. e.g. Error: Sequence file /tmp/2KC2GEDSAz.bbbbb is empty or misformatted Maybe this problem is one with permissions on my system? What generates the *.bbbb files? Thanks for all your help.

Btw I reran the splitMultiFasta.py and it didn't produce any corrupt/binary files. Not sure what happened there.

lee212 commented 7 years ago

*.bbbbb is a protein sequence translation file generated by transeq while mgescan nonltr command runs to identify elements among 12 clades. This file (*.bbbbb) is ephemeral on the /tmp directory therefore we assume you have a write permission on the temp directory.

mcsimenc commented 7 years ago

I discovered that at least some of the *.bbbbb are appearing in /tmp during the run but MGEScan still reports them as empty or misformatted. I found the transeq call in mgescan/src/mgescan/nonltr/hmm/get_phmm.pl and got rid of the 2>/dev/null and think I found the problem: I need libpq.so.5 which looks like a PostgreSQL library. This cluster is running PostgreSQL 8.4.18, maybe I need to upgrade or reinstall?

Thank you for the help Hyungro!

Here is the error thrown by transeq:

/share/apps/genomics/EMBOSS-6.5.7/emboss/.libs/lt-transeq: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory

lee212 commented 7 years ago

Good finding! I am afraid that I don't have a solution for the library issues but when I checked my system, libpq.so.5 is linked like below. I guess you can reinstall (or upgrade) as you mentioned. Or you can try to create a symbolic link as a workaround if you have the libpq.so.5 file in your system:

$ ldd `which transeq`
        ...
        libpq.so.5 => /usr/lib/x86_64-linux-gnu/libpq.so.5 (0x00007fecf7ffd000)
        ...

Regarding to the libpq.so.5 error, I found a suggestion from stackoverflow.com here: http://stackoverflow.com/questions/12781566/error-while-loading-shared-libraries-libpq-so-5-cannot-open-shared-object-file

mcsimenc commented 7 years ago

Yay everything seems to be working! transeq was expecting to find libpq.so.5 in EMBOSS-6.6.0/lib/

I also updated EMBOSS.