Local vs. online resource

liamfriar commented 11 months ago

Hi,

Thank you creating and maintaining this wonderful tool.

I have about 300k protein sequences from ~50 cyanobacteria that I annotated using the online resource (http://eggnog-mapper.embl.de/) and also a local implementation of eggnog-mapper. I was hoping that one would clearly outperform the other, or that they would be essentially the same. Unfortunately, they are quite different and I am not confident of which is preferable after looking around. Do you have any thoughts on when one might be prefferable or why they might give different results?

Some comparisons: Online gave a "preferred_name" to 26% of sequences, whereas the local instance of emapper.py gave a "preferred_name" to 40% of sequences. While 40% would certainly be better than 26%, when I poked around at some specific genes of interest, and compare them to expected results for gene copy number from related reference genomes, each appears to be more "accurate" for different genes.

Online I used all default parameters:

date_created    10/11/23
emapper_version 2.1.12
cmdline emapper.py --cpu 20 --mp_start_method forkserver --data_dir /dev/shm/ -o out --output_dir /emapper_web_jobs/emapper_jobs/user_data/MM_tjknuj5y --temp_dir /emapper_web_jobs/emapper_jobs/user_data/MM_tjknuj5y --override -m diamond --dmnd_ignore_warnings -i /emapper_web_jobs/emapper_jobs/user_data/MM_tjknuj5y/queries.fasta --evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20 --itype proteins --tax_scope auto --target_orthologs all --go_evidence non-electronic --pfam_realign none --report_orthologs --decorate_gff yes --excel  > /emapper_web_jobs/emapper_jobs/user_data/MM_tjknuj5y/emapper.out  2> /emapper_web_jobs/emapper_jobs/user_data/MM_tjknuj5y/emapper.err

I think all of the parameters were the same when I ran emapper.py locally, although the version is different and some of the argument flags have changed. ./emapper.py --data_dir /path/to/miniconda3/envs/eggnog-mapper/data -m diamond -i $infile -o $short_prefix Version: /path/to/miniconda3/envs/eggnog-mapper/lib/python2.7/site-packages emapper-2.0.1 Installed on July 18, 2023:

conda create -n eggnog-mapper
conda activate eggnog-mapper
conda install -c bioconda -c conda-forge eggnog-mapper
cd /path/to/miniconda3/envs/eggnog-mapper/bin
## Make a directory for the eggnog-mapper data
mkdir /path/to/miniconda3/envs/eggnog-mapper/data
## Not sure the following chmod call mattered.
chmod +x download_eggnog_data.py
## It looks like I ran this command without eggnog-mapper being activated.
## But it was just downloading databases, so hopefully it's fine.
./download_eggnog_data.py --data_dir /path/to/miniconda3/envs/eggnog-mapper/data

Any general thoughts on the local vs. online implementations or on the specific information I have given above would be hugely appreciated.

Thank you!

Cantalapiedra commented 9 months ago

Hi @liamfriar ,

The default parameters are different in the web and in the standalone versions. We made the web version more stringent, to avoid FPs, whereas in the standalone version we expect the user to adapt the parameters to their goals. Therefore, the local version has most of the filtering parameters disabled (or set to minimum 0).

I hope this makes sense and sorry for replying so late.

Best, Carlos

liamfriar commented 9 months ago

Thanks.

eggnogdb / eggnog-mapper

Local vs. online resource #485