AnantharamanLab / vRhyme

Binning Virus Genomes from Metagenomes
GNU General Public License v3.0
58 stars 10 forks source link

Error after Performing machine learning classification #32

Open Aciole-David opened 8 months ago

Aciole-David commented 8 months ago

Hello! I'm testing vRhyme and got stuck after 'Performing machine learning classification' step

Running on a slurm HPC system Fresh mamba install Inputs : a) Single-end Next-seq reads; b) virsorter output sequences from megahit contigs

Slurm log below:


/home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/vRhyme:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/p kg_resources.html import pkg_resources Command: /home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/vRhyme -i final-viral-combined.fa -u putativeVLP_data_106.fastq putativeVLP_data_76.fastq putativeVLP_data_77.fastq putativeVLP_data_78.fastq putativeVLP_data_79.fastq putativeVLP_data_80.fastq putativeVLP_data_81.fastq putativeVLP_data_82.fastq putativeVLP_data_83.fastq putativeVLP_data_85.fastq putativeVLP_data_86.fastq putativeVLP_data_87.fastq putativeVLP_data_88.fastq putativeVLP_data_89.fastq -t 20 -o vrhyme_out --method longest --verbose

Date: 2024-03-12 (y-m-d) Start: 11:30:34 (h:m:s) Program: vRhyme v1.1.0

Time (min) | Log

0.0 Initializing and validating vRhyme parameters 0.11 Running 'longest' dereplication: 97% identity and 70% coverage 0.69 No sequences were of sufficient similarity to dereplicate 0.69 Single end read file(s) identified. Running bowtie2 on 14 unpaired file(s) 3.43 Extracting coverage information from BAM files 3.57 Coverage extraction complete. Generating coverage table 3.57 Performing pairwise coverage comparisons 3.58 Running Prodigal on filtered sequences 3.64 Generating codon usage features 3.64 Generating nucleotide features 3.67 Performing pairwise distance calculations 3.67 Performing machine learning classification Traceback (most recent call last): File "/home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/vRhyme", line 960, in net_data = machine_stuff.machine_stuff(distances, presets, model_method, pairs_machine, cohen_machine, iterations, cohen_check) File "/home/hpc_scientist/miniforge3/envs/vrhyme_env/bin/machine_stuff.py", line 73, in machine_stuff model_ET = pickle.load(read_model_ET) File "sklearn/tree/_tree.pyx", line 865, in sklearn.tree._tree.Tree.setstate File "sklearn/tree/_tree.pyx", line 1571, in sklearn.tree._tree._check_node_ndarray ValueError: node array from the pickle has an incompatible dtype:

Thank you!

Aciole-David commented 8 months ago

Easily solved with https://github.com/AnantharamanLab/vRhyme/issues/30. Thanks!

Vini2 commented 3 months ago

Thanks for pointing to the fix @Aciole-David!

@AnantharamanLab It would be great if you can pin the version of scikit-learn in the setup.py and in the README.