SouthGreenPlatform / culebrONT

A snakemake pipeline to assembly, polishing, correction and quality check from Oxford nanopore reads.
GNU General Public License v3.0
36 stars 8 forks source link

No module named 'numpy.core._multiarray_umath' #3

Closed sivankij closed 3 years ago

sivankij commented 3 years ago

Hi,

I am trying to run BUSCO as part of your pipeline (using flye, for instance, but the error happens with every assembler) and constantly getting errors.

The error-

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.6 from "/vol/sci/bio/data/moran.yassour/lab/Projects/ONT/sivan_wgs_dec2020/CulebrONT_OUTPUT_flye2/build_conda_envs/5d40a795/bin/python"
  * The NumPy version is: "1.19.4"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: No module named 'numpy.core._multiarray_umath'

There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.

However, when I run-

/vol/sci/bio/data/moran.yassour/lab/Projects/ONT/sivan_wgs_dec2020/CulebrONT_OUTPUT_flye2/build_conda_envs/5d40a795/bin/python
import numpy
numpy.version.version
1.12.1

(This is because pip uninstall numpy and then conda install numpy) This error also occurs when using numpy version 1.19.5. I am happy to share my input files and config.yaml file with you in order to reproduce the problem. I am completely hopeless as this tool is not working. Also, in the log and error message that I got, it is not clear what numpy versions and modules are required by BUSCO, and I could not find any information about this in the links the error message refers to.

Thank you very much! Sivan

julieaorjuela commented 3 years ago

Hello @sivankij can you please send us you config.yaml, tools_path.yaml, cluster_config.yaml and the sbatch file used to submit culebront ? I wonder if busco is taking numpy version from your system and it is not using the singularity version. we will investigate it !

sivankij commented 3 years ago

Hi, Thanks for your help! These are the file requested- CulebrONT_input.zip

First, we tried running it with the line srun --mem=32g -c8 --time=1-0 script.sh , but it resulted in loads of error lines like this one-

Command '['scontrol', '-o', 'show', 'job', '[]']' returned non-zero exit status 1.
sacct: fatal: Bad job/step specified: []
sacct process error
Command '['sacct', '-P', '-b', '-j', '[]', '-n']' returned non-zero exit status 1.
scontrol_print_job error: Invalid job id specified
scontrol process error

So we tried running it differently-

srun --pty --mem=32g -c8 --time=1-0 tcsh
module load snakemake
module load singularity
module load python/3.6
conda activate base
snakemake --nolock --use-conda --use-singularity --singularity-args '--bind $HOME' --cores -p -s Snakefile --latency-wait 6000000 --keep-going --restart-times 0 --rerun-incomplete --configfile config_file.yaml --conda-prefix ./build_conda_envs --conda-frontend mamba

This got the tool running, then sending out the error I opened this issue about, in the log files. If you need any more information please let me know. Sivan

julieaorjuela commented 3 years ago

Hello, In your _toolspath.yaml file, the BUSCO singularity are not in same tabulation level that blobtools

SINGULARITY:
    REPORT : './Containers/Singularity.report.sif'
    SHASTA : './Containers/Singularity.shasta-0.5.1.sif'
    WEESAM : './Containers/Singularity.weesam.sif'
    ASSEMBLYTICS : './Containers/Singularity.assemblytics.sif'
    MEDAKA : './Containers/Singularity.medaka-gpu-1.2.sif'
    KAT : './Containers/Singularity.kat.sif'
    BLOBTOOLS: './Containers/Singularity.blobtools.sif'
BUSCO: './Singularity.busco-4.1.4.def'

and please provide build singularity .sif instead of .def. I recommend you to put the absolute path of build singularity tools to avoid surprises.

SINGULARITY:
    REPORT : '/path/to/Containers/Singularity.report.sif'
    SHASTA : '/path/to/Containers/Singularity.shasta-0.5.1.sif'
    WEESAM : '/path/to/Containers/Singularity.weesam.sif'
    ASSEMBLYTICS : '/path/to/Containers/Singularity.assemblytics.sif'
    MEDAKA : '/path/to/Containers/Singularity.medaka-gpu-1.2.sif'
    KAT : '/path/to/Containers/Singularity.kat.sif'
    BLOBTOOLS: '/path/to/Containers/Singularity.blobtools.sif'
    BUSCO: '/path/to/Singularity.busco-4.1.4.sif'

Please said me if it works for you!

To launch snakemake command, use sbatch script.sh.

here an example of submit_culebront.sh script I use to lauch culebront with my data:

#!/bin/bash
#SBATCH --job-name Fpaspalum_CulebrONT
#SBATCH --output Fpas_%x_%j.log
#SBATCH --error Fpas_%x_%j.log
#SBATCH --partition=supermem
#SBATCH --cpus-per-task 2
#SBATCH --mem-per-cpu 2G

module load system/singularity/3.6.0
module load system/python/3.7.2

CONFIG_DIR="/scratch/paspalum/CONFIG/"
CULEBRONT="/scratch/orjuela/benchmark/CulebrONT_pipeline/"

# SLURM JOBS WITHOUT PROFILES
snakemake  --nolock --use-conda --use-singularity --cores -p -s $CULEBRONT/Snakefile --latency-wait 600000 --keep-going --restart-times 0 --rerun-incomplete --configfile $CONFIG_DIR/config.yaml --cluster 
"python3 $CULEBRONT/slurm_wrapper.py $CONFIG_DIR/config.yaml $CONFIG_DIR/cluster_config.yaml" --cluster-config $CONFIG_DIR/cluster_config.yaml --cluster-status "python3 $CULEBRONT/slurm_status.py" --conda
-prefix $CULEBRONT/build_conda_envs

to launch it, I do sbatch submit_culebront.sh

if you have errors like that is because cluster_config.yaml ressources are not according with slurm parametters (for example, you ask a lot of RAM in a node with limited ressources)...

Command '['scontrol', '-o', 'show', 'job', '[]']' returned non-zero exit status 1.
sacct: fatal: Bad job/step specified: []
sacct process error
Command '['sacct', '-P', '-b', '-j', '[]', '-n']' returned non-zero exit status 1.
scontrol_print_job error: Invalid job id specified
scontrol process error

I hope my answer can help you!!

Julie

sivankij commented 3 years ago

Dear Julie,

I changes the tools_path.yaml file by adding BUSCO: './Containers/Singularity.busco-4.0.4.def and indeed BUSCO managed to run! It might be a good idea to add the BUSCO line to this file in future releases, because all other paths for the other tools are there.

However, then there was another error, this time Error in rule tag_circular. In the log_file.e the text is-

Error: package 'BiocGenerics' 0.28.0 was found, but >= 0.31.5 is required by 'Biostrings'
In addition: Warning message:
version 0.36.0 of 'BiocGenerics' masked by 0.28.0 in /usr/local/bioinfo/R 
Execution halted

I think the CulebrONT_pipeline is trying to use my personal R packages instead of using the ones installed in the isolated environment. Do you have any idea or explanation for this strange behaviour? From my understanding, the idea of creating a seperate env for this tool is to allow dependencies work smoothly with the right version of different tools, without affecting the softwares installed on my personal account.

Thank you so much, Sivan

julieaorjuela commented 3 years ago

Hello Sivankij. Biostrings is not found on the singularity env used by culebront on the rule tag_circular and then this dependency is searched on your system. We correct it adding Rscript -e "library(BiocManager); install('Biostrings')" into Singularity.report.def container You can build the recipe by using sudo singularity build Singularity.report.sif Singularity.report.def and changing it on the tools_path.yaml. I hope it will work for you!! Julie