EBI-Metagenomics / EukCC

Tool to estimate genome quality of microbial eukaryotes
GNU General Public License v3.0
31 stars 9 forks source link

Failed to successfully run test #23

Closed robinblonk closed 3 years ago

robinblonk commented 3 years ago

Hi,

I have not succesfully run the test you provide for eukcc after installation. I would love to use your tool so I hope you can help me. Some explanation about how I installed the programs: I installed eukcc and eukrep in a new conda environment

conda install eukcc 
conda install eukrep

I downloaded GeneMark-ES/ET/EP ver 4.65_lic for Linux 64, unzipped the file and copied the files to the bin directory in my conda environment.

cp -r gmes_linux_64/* /user/.conda/envs/EukCCEukRep_v2/bin/

I copied the license key to my home directory, changed the name of the key to .gm_key.

cp gm_key ~/
mv ~/gm_key ~/.gm_key

I installed perl packages and modules without “apt”, which is not installed on our linux servers. I found your response on another issue (https://github.com/Finn-Lab/EukCC/issues/21) saying that the following code might do the same job:

env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" $CONDA_PREFIX/bin/cpanm inc::Module::Install::DSL Hash::Merge MCE::Mutex FindBin Test::Pod Logger::Simple  Parallel::ForkManager.pm YAML Math::Utils

For the test, I downloaded the database from the EBI cluster, unzipped it and ran the test

wget -O testgenome.fa.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/251/995/GCF_002251995.1_ASM225199v2/GCF_002251995.1_ASM225199v2_genomic.fna.gz
gunzip testgenome.fa.gz
eukcc --db /eukcc_db_20191023_1 \
    --ncores 4 \
    --ncorespplacer 1 \
    --outdir eukcc_testgenome \ 
/test/testgenome.fa

Which gives the following error:

image

Also, I tried to run filter_euk_bins.py which gives the error message:

ModuleNotFoundError: no module named sklearn.svm.classes

This error is not solved by installing and downgrading scikit-learn as suggested in a closed issue: (https://github.com/patrickwest/EukRep/issues/14)

I hope you have any suggestions on how to make this work. Thank you in advance!

Robin

openpaul commented 3 years ago

Hello, glad you want to use EukCC.

It seems like GeneMark-ES is not working and EukRep seems to be missing the right python dependency

GeneMark-ES: Can you upload the log files in this folder: workfiles/gmes. Namely the runGMES.log and gmes.log. This should help solving the GeneMark-ES issue. Alternatively try to run GeneMark-ES manually to debug the error:

gmes_petap.pl --v --fungus --ES --cores 4 --min_contig 5000 --sequence testgenome.fna

EukRep: As you noticed this is an issue with EukRep. It seems EukRep has some dependencies misconfigured maybe?

I create a test env with this command and could run EukRep with no issues:"

conda create -n test -c bioconda -c conda-forge eukrep "scikit-learn==0.19.2"
conda activate test
EukRep -i testgenome.fa  -o test

Can you verify if this does or does not work for you?

If you want to avoid using GeneMark-ES: I recommend testing out EukCC version 2 (https://github.com/Finn-Lab/EukCC/tree/eukcc2)

I am in the process of making this update, and have been testing it for the last few weeks. Its still a release candidate but works well and is much simpler to use.

If you have singularity its as simple as this:

mkdir eukccdb
cd eukccdb
wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/eukcc/eukcc2_db_ver_1.tar.gz
tar -xzvf eukcc2_db_ver_1.tar.gz
export EUKCC2_DB=$(realpath eukcc2_db_ver_1)
singularity pull docker://openpaul/eukcc2

singularity exec eukcc2_latest.sif eukcc single -h

Documentation for this is still in progress, so you would have to manage with the command line help.

robinblonk commented 3 years ago

Hi,

Thank you very much for your fast response.

EukRep EukRep works perfectly fine now by installing it with conda the way you suggested.

GeneMark-ES GeneMark-ES, however, still does not work. Running GeneMark-ES manually as you suggested to debug the error, results in a message saying Hash/Merge.pm is not located. Note: Hash::Merge is installed and up to date (1.19).

image

The message assumes that perl is involved in the problem. In the GeneMark-ES README file they write: "perl scripts are configured with default perl location at "/usr/bin/perl". The perl location /usr/bin/perl is confirmed:

image

Maybe this makes it more clear what the problem might be?

log files I ran the test code with eukcc (provided on your documentation page) that creates only one logfile (runGMES.log) that is already in the output folder workfiles/gmes.

EukCC2 I would love to use EukCC version 2! This most likely solves the problems. However, our server doesn't have singularity... Is there a way to install EukCC2 via conda?

Thank you again for your effort, I am looking forward to your response.

Robin

openpaul commented 3 years ago

EukCC2 is not yet on conda but you can also use Docker, if that works for you. I will try to write documentation and push it to conda as soon as possible. But will likely take a week or two. Else you can try to install it manually following these instructions:

https://github.com/Finn-Lab/EukCC/blob/eukcc2/Dockerfile

Your GeneMark-ES issue is caused by perl as you correctly figured. I am not too familiar with perl. So I can only recommend googeling this error message. Perl obviously cant find the package, so maybe check that its located in one of the mentioned folders.

robinblonk commented 3 years ago

Thank you again! Unfortunately, I cannot so easily install the program with docker on our server as well. It would be great if EukCC2 can be installed via conda, could you reply on this issue if you succeed pushing it to conda? I'm looking forward to it. Thanks in advance.

Kind regards,

Robin

openpaul commented 3 years ago

As of yesterday EukCC 2 is on conda and the documentation (although a bit small) can be found here: https://eukcc.readthedocs.io/en/latest/

Let me know if it works for you.

https://anaconda.org/bioconda/eukcc

robinblonk commented 3 years ago

Thank you,

I have used conda to install eukcc, but I think it is the old version, and I have not managed to succesfully run the program. How I installed eukcc:

conda create -n eukcc conda activate eukcc conda install -c bioconda eukcc

eukcc -v

EukCC version 0.3, this is the old version right? The flags described in the documentation are also not available (--out is --outdir).

I also downloaded the new database as described in the documentation file, which does not work with this eukcc version (it says it is an old database).

eukcc 2-643080_data_cc_b115.fa --db /export/lv4/projects/CHARS/Robin_Blonk/Nunavut_microbial_fraction/Analysis/eukaryotic_bin_quality/database/eukcc2_db_ver_1

image

Installing with pip gives the same problems.

I hope you can help,

Robin

openpaul commented 3 years ago

You need to run it like so it seems:

conda create -n eukcc -c conda-forge -c bioconda "eukcc>=2"

This will make sure to fetch version 2. I will update the documentation.

edit: Documentation is updated here: https://eukcc.readthedocs.io/en/latest/quickstart.html

robinblonk commented 3 years ago

I succeeded installing and running EukCC2, thank you so much for your help.

Kind regards,

Robin