eggnogdb / eggnog-mapper

Fast genome-wide functional annotation through orthology assignment
http://eggnog-mapper.embl.de
GNU Affero General Public License v3.0
568 stars 106 forks source link

Batch run fails through bash scripts #241

Closed limin321 closed 4 years ago

limin321 commented 4 years ago

Hi, Thank you for developing this nice annotation tool.

I first tested one XX.ffn file, which contains 5618 CDS nucleotide sequence. The annotation file generated contained 5305 proteins. That is, 5618-5305 = 313 CDS failed to be annotated by emapper.py. Is it correct that not all CDS could be annotated?

I assumed that is the case, so I set up run a batch of 35 bacterial XX.ffn files using emapper.py. The script I used was following.

#! /bin/bash

#SBATCH --job-name=eggnog
#SBATCH -p short
#SBATCH -N 1
#SBATCH -n 20
#SBATCH --time=48:00:00
#SBATCH --output=eggnot.txt
#SBATCH --error=eggnog_error.txt

for infile in ./isolates/*.ffn
do 
dir1="./isolates"
base=$(basename $infile ".ffn")
mkdir ${base}
python /eggnog-mapper/emapper.py -m diamond --translate --data_dir /eggnog-mapper/data/ --report_orthologs -i ${dir1}/${base}.ffn --output ${base} --output_dir ./${base} --cpu 20
done

Each genome had three output files: agro.emapper.annotations.csv agro.emapper.annotations.orthologs agro.emapper.seed_orthologs

When I look at carefully, 20 out of 35 annotations.csv files has the size greater than 1 Mb. 15 out of 35 annotations.csv files only had less than 500 kb size. Some file even just had one CDS annotated and for some reason, it just stopped.

So I looked at the eggnog_error.txt file, one type of error message is like this: "Traceback (most recent call last): File "//eggnog-mapper/emapper.py", line 1216, in main(args) File "/eggnog-mapper/emapper.py", line 275, in main annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args) File "/eggnog-mapper/emapper.py", line 753, in annotate_hits_file print >>ORTHOLOGS, '\t'.join(map(str, (query_name, ','.join(orthologs)))) TypeError "

Can anyone have any idea what goes wrong of my codes? How should I fix this problem?

Thank you so much for any help.

Best, Limin

alimayy commented 4 years ago

hi @limin321 , what's the emapper version you're using? Also, I think it would help the developers if you could share the problematic FFN file

limin321 commented 4 years ago

hi @limin321 , what's the emapper version you're using? Also, I think it would help the developers if you could share the problematic FFN file

Hi Ali,

Thank you so much for replying. I attached the Bac.txt file because .ffn extension is not supported so change it to txt. When you run test, you can change it back to .ffn or .fasta. Here is the version I have used diamond v0.9.24.125 emapper-2.0.1b-2-g816e190

I did file containing more than 200,000 cds, and only around 2000 were annotate.

Bac.txt

Thank you so much for trying to help. Best,

alimayy commented 4 years ago

Hi @limin321, sorry if there was a misunderstanding, unfortunately I cannot help you on-hands with this, but I thought the developers can if the right input and parameters are provided (which you did). So let's wait and see if they can.

limin321 commented 4 years ago

Hi @limin321, sorry if there was a misunderstanding, unfortunately I cannot help you on-hands with this, but I thought the developers can if the right input and parameters are provided (which you did). So let's wait and see if they can.

Thanks. Hope the developer could see my message soon.

Cantalapiedra commented 4 years ago

Hi @limin321 ,

and thank you very much @alimayy

Sorry for the late response.

If you really want to retrieve the list of orthologs I would recommend cloning the version in the "refactor" branch, which is anyway the version we are going to merge with the master one very soon.

emapper.py --version emapper-2.0.2-rf1-87-gfcc6955 / Expected eggNOG DB version: 5.0.1 / Installed eggNOG DB version: 5.0.1 / Local diamond version: diamond version 2.0.4 / Local MMseqs2 version: 113e3212c137d026e297c7540e1fcd039f6812b1

I tested your input file with such version (using --itype CDS instead of --translate, and diamond in default sens mode) and it worked fine.

emapper.py -i Bac.txt --itype CDS -m diamond --report_orthologs --output tmp_limin --output_dir tmp_limin --cpu 10

I hope this helps.

Best, Carlos

Cantalapiedra commented 4 years ago

Please, re-open or re-issue if need further help.

limin321 commented 3 years ago

Please, re-open or re-issue if need further help.

Hi Carlos,

Thank you so much. I have been trying the version in the "refactor" branch. Basically, I download the source code under this link: https://github.com/eggnogdb/eggnog-mapper/releases/tag/2.0.2-rf1

then I upload it to the server because run my data on a server. However, when I try to check the version of emapper.py, I got the following error message. Do you have any suggestion what I did wrong ??

[limin.chen@ceres eggnog-mapper-2.0.2-rf1]$ python emapper.py --version File "emapper.py", line 42 help=f'Input FASTA file containing query sequences (proteins by default; see --translate). Required unless -m {SEARCH_MODE_NO_SEARCH}') ^ SyntaxError: invalid syntax

Why it did not return any version information?

Best, Limin

Cantalapiedra commented 3 years ago

Hi @limin321 ,

It is raising an error, and that is why you don't see the version info. The error could be due to using a python version below 3.6, and therefore not recognizing the syntax for f-strings. Which python version are you using?

Carlos

limin321 commented 3 years ago

Hi @limin321 ,

It is raising an error, and that is why you don't see the version info. The error could be due to using a python version below 3.6, and therefore not recognizing the syntax for f-strings. Which python version are you using?

Carlos

Hi Carlos,

Even my python version is 3.7, I still get error messages. Here is the details.

(base) KluepfelLabMBP01:eggnog-mapper-2.0.2-rf1 dklabuser$ python3.7 emapper.py --version Traceback (most recent call last): File "emapper.py", line 412, in args = parse_args(parser) File "emapper.py", line 305, in parse_args print(get_version()) File "/Users/dklabuser/limin/data_analysis/eggNog/eggnog-mapper-2.0.2-rf1/eggnogmapper/common.py", line 161, in get_version db_version = get_db_version() File "/Users/dklabuser/limin/data_analysis/eggNog/eggnog-mapper-2.0.2-rf1/eggnogmapper/common.py", line 170, in get_db_version return db_sqlite.get_db_version() File "/Users/dklabuser/limin/data_analysis/eggNog/eggnog-mapper-2.0.2-rf1/eggnogmapper/annotation/db_sqlite.py", line 28, in get_db_version db.execute(cmd) sqlite3.OperationalError: no such table: version (base) KluepfelLabMBP01:eggnog-mapper-2.0.2-rf1 dklabuser$ python --version Python 3.7.4

Any suggestions?

Best, Limin

Cantalapiedra commented 3 years ago

Hi @limin321 ,

Did you run the script to download the eggnog-mapper databases (download_eggnog_data.py)? Maybe the error you have is because of that. The refactor version uses a new version of the database.

Best, Carlos

limin321 commented 3 years ago

Hi @limin321 ,

Did you run the script to download the eggnog-mapper databases (download_eggnog_data.py)? Maybe the error you have is because of that. The refactor version uses a new version of the database.

Best, Carlos

Hi Carlos

I also run into issues when downloading the database, here is my code and error message. [limin.chen@ceres eggnog-mapper-2.0.2-rf1]$ python3 download_eggnog_data.py Download main annotation database? [y,n] y Traceback (most recent call last): File "download_eggnog_data.py", line 93, in if args.allyes or ask("Download main annotation database?") == 'y': File "/KEEP/cpgru_targetedseq/eggnog-mapper-2.0.2-rf1/eggnogmapper/utils.py", line 195, in ask v = eval(input("%s [%s] " % (string,','.join(valid_values) ))) File "", line 1, in NameError: name 'y' is not defined [limin.chen@ceres eggnog-mapper-2.0.2-rf1]$ python3 --version Python 3.6.6

I don't understand 'y' is a string, why does it need to be defined?

Best, Limin

Cantalapiedra commented 3 years ago

Hi @limin321 ,

The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) )))

Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone:

git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git

or

git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git

I hope that downloading such version you will be able to download the databases and give it a try.

Thank you.

Best, Carlos

limin321 commented 3 years ago

Hi @limin321 ,

The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) )))

Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone:

git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git

or

git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git

I hope that downloading such version you will be able to download the databases and give it a try.

Thank you.

Best, Carlos

Hi Carlos, Thank you so much for the codes. With the two codes you provided, I am able to download the database.

However, when I look at the version, still it has error: [limin.chen@ceres eggnog-mapper]$ python3 emapper.py --version There was an error retrieving eggnog-mapper DB data: not a valid file "/KEEP/cpgru_targetedseq/eggnog-mapper/data/eggnog.db" Maybe you need to run download_eggnog_data.py Traceback (most recent call last): File "emapper.py", line 552, in args = parse_args(parser) File "emapper.py", line 372, in parse_args print(get_full_version_info()) File "/KEEP/cpgru_targetedseq/eggnog-mapper/eggnogmapper/common.py", line 182, in get_full_version_info dmnd_version = get_diamond_version() File "/KEEP/cpgru_targetedseq/eggnog-mapper/eggnogmapper/common.py", line 233, in get_diamond_version completed_process = run(cmd, capture_output=True, check=True, shell=True) File "/software/7/apps/python_3/3.6.6/lib/python3.6/subprocess.py", line 403, in run with Popen(*popenargs, **kwargs) as process: TypeError: init() got an unexpected keyword argument 'capture_output'

I also tried --help argument, seems to me it works. I am testing now. [limin.chen@ceres eggnog-mapper]$ python3 emapper.py --help usage: emapper.py [-h] [-v] [--list_taxa] [--cpu NUM_CPU] [-i FASTA_FILE] [--itype {CDS,proteins,genome,metagenome}] [--translate] [--annotate_hits_table SEED_ORTHOLOGS_FILE] [-c FILE] [--data_dir DIR] [--genepred {search,prodigal}] [-m {diamond,mmseqs,hmmer,no_search,cache}] [--pident PIDENT] [--query_cover QUERY_COVER] [--subject_cover SUBJECT_COVER] [--evalue EVALUE] [--score SCORE] [--dmnd_db DMND_DB_FILE] [--sensmode {fast,mid-sensitive,sensitive,more-sensitive,very-sensitive,ultra-sensitive}] [--matrix {BLOSUM62,BLOSUM90,BLOSUM80,BLOSUM50,BLOSUM45,PAM250,PAM70,PAM30}]

Will let you know if it will work on my data. Best, Limin

limin321 commented 3 years ago

Hi @limin321 ,

The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) )))

Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone:

git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git

or

git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git

I hope that downloading such version you will be able to download the databases and give it a try.

Thank you.

Best, Carlos

Hi Carlos,

I tested my own data and using the version downloaded with your recommended code: git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git It fails with the similar error as I run --version command. Traceback (most recent call last): File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/emapper.py", line 563, in emapper.run(args, args.input, args.annotate_hits_table, args.cache_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/emapper.py", line 240, in run self.searcher = self.search(args, queries_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/emapper.py", line 123, in search pjoin(self._current_dir, self.hmm_hits_file)) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 76, in search return self._search(in_file, seed_orthologs_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 92, in _search raise e File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 87, in _search cmd = self.run_diamond(in_file, output_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 147, in run_diamond completed_process = subprocess.run(cmd, capture_output=True, check=True, shell=True) File "/software/7/apps/python_3/3.6.6/lib/python3.6/subprocess.py", line 403, in run with Popen(*popenargs, **kwargs) as process: TypeError: init() got an unexpected keyword argument 'capture_output'

Any thoughts on this error?

Best, Limin

limin321 commented 3 years ago

Hi @limin321 ,

The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) )))

Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone:

git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git

or

git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git

I hope that downloading such version you will be able to download the databases and give it a try.

Thank you.

Best, Carlos

Hi Carlos,

When I tried using the web version of eggnog, it is able to annotate the genome I submitted as AA sequence. When I try to annotate using command line providing with AA sequence, I still run into the same error as before. It failed after running 40 mins, please see blow: [limin.chen@ceres Yub001]$ tail eggnog_error.txt Reported 21037 pairwise alignments, 21046 HSPs. 5664 queries aligned. Traceback (most recent call last): File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 1216, in main(args) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 275, in main annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 753, in annotate_hits_file print >>ORTHOLOGS, '\t'.join(map(str, (query_name, ','.join(orthologs)))) TypeError

what is the version of online resource: http://eggnog-mapper.embl.de/

Thank you so much. Best, Limin

Cantalapiedra commented 3 years ago

Hi @limin321 , The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) ))) Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone: git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git or git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git I hope that downloading such version you will be able to download the databases and give it a try. Thank you. Best, Carlos

Hi Carlos,

I tested my own data and using the version downloaded with your recommended code: git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git It fails with the similar error as I run --version command. Traceback (most recent call last): File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/emapper.py", line 563, in emapper.run(args, args.input, args.annotate_hits_table, args.cache_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/emapper.py", line 240, in run self.searcher = self.search(args, queries_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/emapper.py", line 123, in search pjoin(self._current_dir, self.hmm_hits_file)) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 76, in search return self._search(in_file, seed_orthologs_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 92, in _search raise e File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 87, in _search cmd = self.run_diamond(in_file, output_file) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper/eggnogmapper/search/diamond/diamond.py", line 147, in run_diamond completed_process = subprocess.run(cmd, capture_output=True, check=True, shell=True) File "/software/7/apps/python_3/3.6.6/lib/python3.6/subprocess.py", line 403, in run with Popen(*popenargs, kwargs) as process: TypeError: init**() got an unexpected keyword argument 'capture_output'

Any thoughts on this error?

Best, Limin

Hi @limin321 ,

sorry, my bad. I did some tests and it seems that current version requires at least python 3.7 ("capture_output" was added in python 3.7). You may need to install python 3.7 or greater, for example using conda:

conda create -n py370 python=3.7.0 conda activate py370 conda install biopython=1.76 psutil=5.7.0

or using pip install:

conda create -n py370 python=3.7.0 conda activate py370 pip install -r requirements.txt

Best, Carlos

Cantalapiedra commented 3 years ago

Hi @limin321 , The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) ))) Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone: git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git or git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git I hope that downloading such version you will be able to download the databases and give it a try. Thank you. Best, Carlos

Hi Carlos,

When I tried using the web version of eggnog, it is able to annotate the genome I submitted as AA sequence. When I try to annotate using command line providing with AA sequence, I still run into the same error as before. It failed after running 40 mins, please see blow: [limin.chen@ceres Yub001]$ tail eggnog_error.txt Reported 21037 pairwise alignments, 21046 HSPs. 5664 queries aligned. Traceback (most recent call last): File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 1216, in main(args) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 275, in main annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 753, in annotate_hits_file print >>ORTHOLOGS, '\t'.join(map(str, (query_name, ','.join(orthologs)))) TypeError

what is the version of online resource: http://eggnog-mapper.embl.de/

Thank you so much. Best, Limin

The web version is using: emapper-1.0.3-35-g63c274b I would recommend you keep trying with the refactor version and python 3.7 or above.

Best, Carlos

limin321 commented 3 years ago

Hi @limin321 , The next line from your output is from a previous version of the refactor branch: v = eval(input("%s [%s] " % (string,','.join(valid_values) ))) Unfortunately, the refactor branch is under development, and the tag you downloaded is just a tag to track some changes, but should not be considered as a proper release. Sorry for the inconvenience. I recommend you downloading the current version of the branch with git clone: git clone -b refactor https://github.com/eggnogdb/eggnog-mapper.git or git clone --single-branch --branch refactor https://github.com/eggnogdb/eggnog-mapper.git I hope that downloading such version you will be able to download the databases and give it a try. Thank you. Best, Carlos

Hi Carlos, When I tried using the web version of eggnog, it is able to annotate the genome I submitted as AA sequence. When I try to annotate using command line providing with AA sequence, I still run into the same error as before. It failed after running 40 mins, please see blow: [limin.chen@ceres Yub001]$ tail eggnog_error.txt Reported 21037 pairwise alignments, 21046 HSPs. 5664 queries aligned. Traceback (most recent call last): File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 1216, in main(args) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 275, in main annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args) File "/KEEP/cpgru_targetedseq/limin/eggnog-mapper1/emapper.py", line 753, in annotate_hits_file print >>ORTHOLOGS, '\t'.join(map(str, (query_name, ','.join(orthologs)))) TypeError what is the version of online resource: http://eggnog-mapper.embl.de/ Thank you so much. Best, Limin

The web version is using: emapper-1.0.3-35-g63c274b I would recommend you keep trying with the refactor version and python 3.7 or above.

Best, Carlos

Hi Carlos,

You are right. I just tested one sample and it works after using python 3.7.4.

When I use the master branch version, there is a column called "COG Functional cat."; Now I used the refactor version, the output doesn't have that column. Following is my command, will that because I didn't include --itype proteins. I want the output to have that column.

python ./eggnog-mapper/emapper.py -i ./Yub001.faa --data_dir /KEEP/cpgru_targetedseq/limin/eggnog-mapper/data -m diamond --report_orthologs --output Yub001 --output_dir Yub001_AA --cpu 20 --override

Thank you so much.

Best, Limin

Cantalapiedra commented 3 years ago

Hi Limin,

not sure if columns "narr_og_cat" and "best_og_cat" (columns 7 and 10) is what you are looking for.

Best, Carlos

limin321 commented 3 years ago

Hi Limin,

not sure if columns "narr_og_cat" and "best_og_cat" (columns 7 and 10) is what you are looking for.

Best, Carlos

Thank you so much. Carlos, Yes, those are the two I want. Just one more quick question. If the annotation in "best_og_cat" is different from "narr_og_cat", does that mean the one in "best_og_cat" is more reliable than "narr_og_cat" considering it is called "best"?

Thank you so much for all the help. Really appreciate it. Best, Limin

Cantalapiedra commented 3 years ago

Hi @limin321 ,

I guess "best_og" is not a very good name. Maybe it should be called "Annotation OG" or similar.

The difference between "best_og" and "narr_og" is:

For example, if one of your queries hits a protein called "COG0012", this would be now the seed ortholog. If you search the "COG0012" protein in http://eggnog5.embl.de/ you will find that it belongs to 5 OGs. The "narr_og" would be the one from "Rhizobiaceae", whereas the "best_og" could be at the "Root", "Bacteria", "Proteobacteria", "Alphaproteobacteria" or "Rhizobiaceae" levels, depending on the --tax_scope parameter.

I hope this makes sense.

And thanks to you for your patience. I am glad to try to help.

Best, Carlos

limin321 commented 3 years ago

Hi Carlos,

Thank you for this excellent example, making it easy to understand clearly. Thank you for developing these tools, making annotation life much easier.

Best Regards! Limin

Cantalapiedra commented 3 years ago

Glad to help. And thanks to you. Best, Carlos