Open leonhardt913 opened 1 year ago
Hi @leonhardt913 ,
As I don't use Windows for running eggnog-mapper, my only advice regarding this would be to use Linux within Windows. I use Ubuntu, and it works very well for me. It is rather easy to install, for instance:
https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10#1-overview
The specific error that you have it could be because eggnog-mapper is trying to use the bundled diamond program, which is very likely a compilation for linux. If you still want to use it from Windows, you may try installing a diamond version for Windows and add it to your environmental path.
Best, Carlos
@Cantalapiedra
Hi Carlos. Thanks for you advice. I will take a look at the turorials you send, and figure out installing python and eggnog-mapper in the Ubuntu.
It might be a huge work for me to figure out replace the bundled diamond program with Windows version without getting more errors. I am not sure if anyone else in this community have done this and able to provide me some advice. But for now I would rather try using the virtual Linux system since I heard many bioinformatic tools runs well in Ubuntu.
Best, Leo
I hope that you can make it work! Good luck!
Of course, if you need any advice during the installation, don't hesitate to ask. Once that you are able to run Linux, I would advice you to follow this:
Often, the easiest is to do it with conda or pip. Be sure to have them updated, so that you are able to install the latests versions of the software. Once you have done that, please follow this, to set up the databases, etc:
https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#user-content-Setup
I would begin by installing only the complete diamond database. Then test if the you able to obtain some annotations.
https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#basic-usage
Once you have it working, you may worry about other options, databases, recipes, etc.
Just my 2 cents.
Best, Carlos
Dear @Cantalapiedra ,
Thanks for your help, I have one question about the Linux OS I installed:
I updated the WSL2 and installed the Ubuntu by using the .appx file i downloaded before. (I changed the suffix to .zip, unzip it and installed it using the .exe inside, and setup my UNIX account as well)
My Ubuntu version is 20.04:
PS C:\windows\system32> wsl -l -v
NAME STATE VERSION
* Ubuntu-20.04 Running 2
Then I type "wsl" to initiate Ubuntu, I used the following command for updating system which took me about 10 mins:
/mnt/c/windows/system32$ sudo apt-get -y update && sudo apt-get -y upgrade
During updates I see some pop-up lines with "python3", so I guess my Linux already has python3 installed:
Preparing to unpack .../120-python3-cryptography_2.8-3ubuntu0.1_amd64.deb ...
Unpacking python3-cryptography (2.8-3ubuntu0.1) over (2.8-3) ...
Preparing to unpack .../121-python3-jwt_1.7.1-2ubuntu2.1_all.deb ...
Unpacking python3-jwt (1.7.1-2ubuntu2.1) over (1.7.1-2ubuntu2) ...
Preparing to unpack .../122-python3-urllib3_1.25.8-2ubuntu0.2_all.deb ...
Unpacking python3-urllib3 (1.25.8-2ubuntu0.2) over (1.25.8-2) ...
Preparing to unpack .../123-python3-requests_2.22.0-2ubuntu1_all.deb ...
Unpacking python3-requests (2.22.0-2ubuntu1) over (2.22.0-2build1) ...
Then I found out pip has to be installed independently:
/mnt/c/windows/system32$ pip
Command 'pip' not found, but can be installed with:
sudo apt install python3-pip
/mnt/c/windows/system32$ sudo apt install python3-pip
My question is that do I have to update the python? I'm not sure if this python3 fits the requirements (python3.7 or higher says in the Wiki), so I can continue downloading the eggnog-mapper, biopython, etc.
Look forward to your reply, Leo
Hi @leonhardt913 ,
You can check python version with python3 --version
.
You may upgrade python, for instance https://cloudbytes.dev/snippets/upgrade-python-to-latest-version-on-ubuntu-linux
However, if you don't want to upgrade python, you may use an environment manager (e.g. conda).
You may install a Miniconda, Miniforge, Minimamba, or similar, and then just conda install eggnog-mapper
, which would install (hopefully) the correct python version (and other packages) for using eggnog-mapper.
Hi @Cantalapiedra ,
I pretty much setup everything but stuck at the first test run again.
I installed Miniconda3 and setup my environment named as "rna" , and upgraded my python to 3.11 as well as installed the eggnog-mapper.
My eggnog-path is :
/home/leo913/miniconda3/envs/rna/lib/python3.11/site-packages/eggnogmapper
so I mimic the first step of setting up the PATH (I am not sure what it's for, and if the following code is wrong I guess it might be the reason my running is stuck? please correct me if it's wrong):
export PATH=/home/leo913/miniconda3/envs/rna/lib/python3.11/site-packages/eggnogmapper:/home/leo913/miniconda3/envs/rna/lib/python3.11/site-packages/eggnogmapper/bin:"$PATH"
Then I set up the dir for downloading diamond database and successfully downloaded them, but I later move them under the /python3.11/site-packages/data/
At this moment I am able to call out emapper.py in any folder:
leo913@DESKTOP-56F5NN6:/mnt/c/windows/system32$ cd ~
leo913@DESKTOP-56F5NN6:~$ ls
miniconda3
leo913@DESKTOP-56F5NN6:~$ mkdir eggnog-mapper-workplace
leo913@DESKTOP-56F5NN6:~$ cd eggnog-mapper-workplace
leo913@DESKTOP-56F5NN6:~/eggnog-mapper-workplace$ conda activate rna
(rna) leo913@DESKTOP-56F5NN6:~/eggnog-mapper-workplace$ emapper.py -i
usage: emapper.py [-h] [-v] [--list_taxa] [--cpu NUM_CPU] [--mp_start_method {fork,spawn,forkserver}] [--resume]
[--override] [-i FASTA_FILE] [--itype {CDS,proteins,genome,metagenome}] [--translate]
[--annotate_hits_table SEED_ORTHOLOGS_FILE] [-c FILE] [--data_dir DIR]
[--genepred {search,prodigal}] [--trans_table TRANS_TABLE_CODE] [--training_genome FILE]
[--training_file FILE] [--allow_overlaps {none,strand,diff_frame,all}] [--overlap_tol FLOAT]
[-m {diamond,mmseqs,hmmer,no_search,cache,novel_fams}] [--pident PIDENT] [--query_cover QUERY_COVER]
[--subject_cover SUBJECT_COVER] [--evalue EVALUE] [--score SCORE] [--dmnd_algo {auto,0,1,ctg}]
[--dmnd_db DMND_DB_FILE]
[--sensmode {default,fast,mid-sensitive,sensitive,more-sensitive,very-sensitive,ultra-sensitive}]
[--dmnd_iterate {yes,no}]
[--matrix {BLOSUM62,BLOSUM90,BLOSUM80,BLOSUM50,BLOSUM45,PAM250,PAM70,PAM30}]
[--dmnd_frameshift DMND_FRAMESHIFT] [--gapopen GAPOPEN] [--gapextend GAPEXTEND]
[--block_size BLOCK_SIZE] [--index_chunks CHUNKS] [--outfmt_short] [--dmnd_ignore_warnings]
[--mmseqs_db MMSEQS_DB_FILE] [--start_sens START_SENS] [--sens_steps SENS_STEPS]
[--final_sens FINAL_SENS] [--mmseqs_sub_mat SUBS_MATRIX] [-d HMMER_DB_PREFIX] [--servers_list FILE]
[--qtype {hmm,seq}] [--dbtype {hmmdb,seqdb}] [--usemem] [-p PORT] [--end_port PORT]
[--num_servers NUM_SERVERS] [--num_workers NUM_WORKERS] [--timeout_load_server TIMEOUT_LOAD_SERVER]
[--hmm_maxhits MAXHITS] [--report_no_hits] [--hmm_maxseqlen MAXSEQLEN] [--Z DB_SIZE] [--cut_ga]
[--clean_overlaps none|all|clans|hmmsearch_all|hmmsearch_clans] [--no_annot] [--dbmem]
[--seed_ortholog_evalue MIN_E-VALUE] [--seed_ortholog_score MIN_SCORE] [--tax_scope TAX_SCOPE]
[--tax_scope_mode TAX_SCOPE_MODE] [--target_orthologs {one2one,many2one,one2many,many2many,all}]
[--target_taxa LIST_OF_TAX_IDS] [--excluded_taxa LIST_OF_TAX_IDS] [--report_orthologs]
[--go_evidence {experimental,non-electronic,all}] [--pfam_realign {none,realign,denovo}] [--md5]
[--output FILE_PREFIX] [--output_dir DIR] [--scratch_dir DIR] [--temp_dir DIR] [--no_file_comments]
[--decorate_gff DECORATE_GFF] [--decorate_gff_ID_field DECORATE_GFF_ID_FIELD] [--excel]
emapper.py: error: argument -i: expected one argument
Then i used Windows File explorer to copy test.fasta into Linux system under /eggnog-mapper-workplace/ And then try to run the emapper.py:
(rna) leo913@DESKTOP-56F5NN6:~/eggnog-mapper-workplace$ ls
test.fasta
(rna)leo913@DESKTOP-56F5NN6:~/eggnog-mapper-workplace$ emapper.py -i test.fasta -o result1
# emapper-2.1.10
# emapper.py -i test.fasta -o result1
/home/leo913/miniconda3/envs/rna/bin/diamond blastp -d '/home/leo913/miniconda3/envs/rna/lib/python3.11/site-packages/data/eggnog_proteins.dmnd' -q '/home/leo913/eggnog-mapper-workplace/test.fasta' --threads 1 -o '/home/leo913/eggnog-mapper-workplace/result1.emapper.hits' --tmpdir '/home/leo913/eggnog-mapper-workplace/emappertmp_dmdn_z5vu_inu' --sensitive --iterate -e 0.001 --top 3 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhsp
After that there is no more response from the emapper or the Powershell even I can type things on the pointer, I have to close the Powershell forcely. The test.fasta I used only contains about 100 proteins. And I look into the /eggnog-mapper-workplace/ with Windows file exploere and found emapper did created "result1.emapper.hits" file with 0 KB, and a new folder "emappertmp_dmdn_z5vu_inu" with nothing inside.
Not sure where did I do wrong. (maybe the PATH settings I mentioned above?)
Look forward to the helps. Leo
Hi @leonhardt913 ,
How long was the last command running? It just seems that it didn't finish? I am not sure. You may try with an even smaller test fasta file (1 sequence, for instance) at least for the test.
I will try running it again with smaller test file next week since I am out of office. I will let you know the result.
Hi @Cantalapiedra ,
I used fasta file with 1 protein sequence and successfully got my result.
(rna) leo913@DESKTOP-56F5NN6:~/eggnog-mapper-workplace$ emapper.py -i test.fasta -o result1
# emapper-2.1.10
# emapper.py -i test.fasta -o result1
/home/leo913/miniconda3/envs/rna/bin/diamond blastp -d '/home/leo913/miniconda3/envs/rna/lib/python3.11/site-packages/data/eggnog_proteins.dmnd' -q '/home/leo913/eggnog-mapper-workplace/test.fasta' --threads 1 -o '/home/leo913/eggnog-mapper-workplace/result1.emapper.hits' --tmpdir '/home/leo913/eggnog-mapper-workplace/emappertmp_dmdn_94hr8z8v' --sensitive --iterate -e 0.001 --top 3 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovhsp scovhsp
Functional annotation of hits...
1 1.0553483963012695 0.95 q/s (% mem usage: 3.20, % mem avail: 96.82)
Done
Result files:
/home/leo913/eggnog-mapper-workplace/result1.emapper.hits
/home/leo913/eggnog-mapper-workplace/result1.emapper.seed_orthologs
/home/leo913/eggnog-mapper-workplace/result1.emapper.annotations
================================================================================
CITATION:
If you use this software, please cite:
[1] eggNOG-mapper v2: functional annotation, orthology assignments, and domain
prediction at the metagenomic scale. Carlos P. Cantalapiedra,
Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021.
Molecular Biology and Evolution, msab293, https://doi.org/10.1093/molbev/msab293
[2] eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated
orthology resource based on 5090 organisms and 2502 viruses. Jaime
Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernandez-Plaza,
Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas
Rattei, Lars J Jensen, Christian von Mering and Peer Bork. Nucleic Acids
Research, Volume 47, Issue D1, 8 January 2019, Pages D309-D314,
https://doi.org/10.1093/nar/gky1085
[3] Sensitive protein alignments at tree-of-life scale using DIAMOND.
Buchfink B, Reuter K, Drost HG. 2021.
Nature Methods 18, 366–368 (2021). https://doi.org/10.1038/s41592-021-01101-x
e.g. Functional annotation was performed using eggNOG-mapper (version emapper-2.1.10) [1]
based on eggNOG orthology data [2]. Sequence searches were performed using [3].
================================================================================
Total hits processed: 1
Total time: 2329 secs
FINISHED
I am very appreciated that my first test run was completed under your helps.
As it shows, it took me about 2329 secs (almost 40 mins) to finish the job for 1 protein, I guess probably the hardware of regular desktop PC has its limits, but still I wonder if there is anyway speeding up the annotation, or any configuration of Ubuntu (or WLS2) should be changed?, since the coming fasta files could contain 100,000+ proteins.
But it made me confused that in the middle of outcome result, it shows:
Functional annotation of hits...
1 1.0553483963012695 0.95 q/s (% mem usage: 3.20, % mem avail: 96.82)
Done
Which is way more different to "2329 secs" showed in the bottom. It seems that my PC is able to annotate faster but it took 40mins to produce the result. Do you have any idea about it?
Best regards, Leo
PS: I started another run of annotation with about 100 proteins in test2.fasta, will see how long it takes.
Hi @leonhardt913 ,
Glad that it worked. Probably the first job, with 100 proteins, just didn't finish yet.
The q/s that you see corresponds to the annotation stage only. I guess that the rest of the 2329 seconds went to the diamond search. Note that diamond scales very well for large queries, but by default is not the faster for small queries.
Of course, depending on your hardware there are ways to speed up things. For instance, when using diamond for small queries you may use the --dmnd_algo ctg
option. See diamond options at https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#diamond-search-options
Also the number of threads that you use, --cpu
, has a large impact. See https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#execution-options
Note that for a large number of queries, the stage that is usually slower is the annotation stage. If you have enough memory you may accelerate this with --dbmem
. See https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#user-content-Other_Requirements and https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.10#user-content-Annotation_Options
There are many options, and depending on your data and hardware you may want to use them or not.
I hope this is of help.
Best, Carlos
Hi @Cantalapiedra ,
Thanks for the tips!
The 100 proteins tests run was interupted due to the Windows update in midnight I guess. However, I used my 200k-proteins fasta file with additional option " --cpu 8 ". and the annotation took about 6 hours, which is totally acceptable for me.
Total hits processed: 199714
Total time: 19692 secs
FINISHED
Again thanks for your help!
Best, Leo
Glad to be of help!
Hi,
I am new to Eggnog-Mapper and barely have experience on using Python. Previously I attempted to install it on Windows computer but failed to run it for some problems that I had no idea how to fix. Then I used Webpage version of Eggnog-Mapper for the needs.
Now my protein fasta file is huge. I am now considering run Eggnog-mapper in my Windows PC to do the annotation.
The OS of my PC is Microsoft Windows [Version 10.0.19045.2728]
I installed python-3.8.8, but not for higher version, which encounters problem installing required biopython(v1.76) along with eggnog-mapper. Even I install higher version of biopython manually (should be v1.80 or v1.81), It attempts to remove higher version of biopython and tried install v1.76 biopython and eventually failed again. So anyway I found out old version Python worked for me, and installed it in CMD,
After installing, because other steps made me little confused, (not sure if it is the reason cause the error) I jumped to the eggNOG-mapper databases download in Setup section. I found a way to manually download the eggnog.db eggnog.taxa.tar eggnog_proteins.dmnd and put them in Python38\Lib\site-packages\data.
Then, I test the command emapper.py in CMD but it always open the emapper.py file using my default program opening .py files, even it does have #! line in emapper.py. Then I tried going to the scripts folder and run " python emapper.py " and it works.
I simply put my test.fasta (which is small, about 2Mb) files in "Python38\Scripts" folder and test the command, but it quickly failed.
Above are the steps I have done so far. Any helps will be appreciated.