genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

[test,conda]: read_clustering error #49

Open RemiMaglione opened 3 years ago

RemiMaglione commented 3 years ago

Dear NANOClust team, I wanted to give a shot to your pipeline. I proceed with a fresh install of Miniconda and nextflow. After clonning this repo and the NCBI database, I just launched a test with nextflow run main.nf -profile test,conda and get the following error. Could you help me? Thanks:

nextflow run main.nf -profile test,conda N E X T F L O W ~ version 21.04.0 Launching main.nf [compassionate_bose] - revision: 5e0f88a799


  _   __                     ________    __  _____________
 / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
/  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /   

/ /| / // / / / / // / / // // /_/ // // /
/
/ |/_,// /_/_/ ____/
/__//__//_/

NanoCLUST v1.0dev

Run Name : compassionate_bose Reads : /home/omnia/Downloads/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq Max Resources : 128 GB memory, 16 cpus, 10d time per job Output dir : ./results Launch dir : /home/omnia/Downloads/NanoCLUST Working dir : /home/omnia/Downloads/NanoCLUST/work Script dir : /home/omnia/Downloads/NanoCLUST User : omnia Config Profile : test,conda Config Description: Minimal test dataset to check pipeline function

[- ] process > QC - [- ] process > fastqc - [- ] process > kmer_freqs - [- ] process > read_clustering - executor > local (1) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ [- ] process > fastqc - [- ] process > kmer_freqs - executor > local (2) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ [- ] process > fastqc - [- ] process > kmer_freqs - executor > local (2) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ [- ] process > fastqc - [- ] process > kmer_freqs - executor > local (3) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ [- ] process > fastqc - [ce/aa2011] process > kmer_freqs (1) [ 0%] 0 of 1 executor > local (3) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ executor > local (5) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ [08/1dd3e6] process > fastqc (1) [100%] 1 of 1 ✔ [ce/aa2011] process > kmer_freqs (1) [100%] 1 of 1 ✔ [30/ae4788] process > read_clustering (1) [ 0%] 0 of 1 [- ] process > split_by_cluster - [- ] process > read_correction - [- ] process > draft_selection - [- ] process > racon_pass - [- ] process > medaka_pass - [- ] process > consensus_classification - [- ] process > join_results - [- ] process > get_abundances - [- ] process > plot_abundances - [a1/ef749f] process > output_documentation [100%] 1 of 1 ✔ Error executing process > 'read_clustering (1)'

Caused by: Process read_clustering (1) terminated with an error exit status (1)

Command executed [/home/omnia/Downloads/NanoCLUST/templates/umap_hdbscan.py]:

!/usr/bin/env python

import numpy as np import umap import matplotlib.pyplot as plt from sklearn import decomposition import random import pandas as pd import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

UMAP

motifs = [x for x in df.columns.values if x not in ["read", "length"]] X = df.loc[:,motifs] X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"]) umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

HDBSCAN

X = umap_out.loc[:,["D1", "D2"]] umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

PLOT

plt.figure(figsize=(20,20)) plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1) plt.xlabel("UMAP1", fontsize=18) plt.ylabel("UMAP2", fontsize=18) plt.gca().set_aspect('equal', 'datalim') plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)

for cluster in np.sort(umap_out['bin_id'].unique()): read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0] plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png') umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status: 1

Command output: (empty)

Command error: retval = self._compile_core(args, return_type) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core cres = compiler.compile_extra(self.targetdescr.typing_context, File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra return pipeline.compile_extra(func) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra return self._compile_bytecode() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode return self._compile_core() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core raise e File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 386, in _compile_core pm.run(self.state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 339, in run raise patched_exception File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 330, in run self._runPass(idx, pass_inst, state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock return func(*args, **kwargs) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass mutated |= check(pss.run_pass, internal_state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 262, in check mangled = func(compiler_state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 463, in run_pass NativeLowering().run_pass(state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 384, in run_pass lower.lower() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 136, in lower self.lower_normal_function(self.fndesc) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 190, in lower_normal_function entry_block_tail = self.lower_function_body() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 216, in lower_function_body self.lower_block(block) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 230, in lower_block self.lower_inst(inst) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/contextlib.py", line 131, in exit self.gen.throw(type, value, traceback) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/errors.py", line 751, in new_error_context raise newerr.with_traceback(tb) numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend) Storing i64 to ptr of i32 ('dim'). FE type int32 executor > local (5) [88/4947f9] process > QC (1) [100%] 1 of 1 ✔ [08/1dd3e6] process > fastqc (1) [100%] 1 of 1 ✔ [ce/aa2011] process > kmer_freqs (1) [100%] 1 of 1 ✔ [30/ae4788] process > read_clustering (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > split_by_cluster - [- ] process > read_correction - [- ] process > draft_selection - [- ] process > racon_pass - [- ] process > medaka_pass - [- ] process > consensus_classification - [- ] process > join_results - [- ] process > get_abundances - [- ] process > plot_abundances - [a1/ef749f] process > output_documentation [100%] 1 of 1 ✔ Error executing process > 'read_clustering (1)'

Caused by: Process read_clustering (1) terminated with an error exit status (1)

Command executed [/home/omnia/Downloads/NanoCLUST/templates/umap_hdbscan.py]:

!/usr/bin/env python

import numpy as np import umap import matplotlib.pyplot as plt from sklearn import decomposition import random import pandas as pd import hdbscan

df = pd.read_csv("freqs.txt", delimiter=" ")

UMAP

motifs = [x for x in df.columns.values if x not in ["read", "length"]] X = df.loc[:,motifs] X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"]) umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

HDBSCAN

X = umap_out.loc[:,["D1", "D2"]] umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

PLOT

plt.figure(figsize=(20,20)) plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1) plt.xlabel("UMAP1", fontsize=18) plt.ylabel("UMAP2", fontsize=18) plt.gca().set_aspect('equal', 'datalim') plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)

for cluster in np.sort(umap_out['bin_id'].unique()): read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0] plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png') umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status: 1

Command output: (empty)

Command error: retval = self._compile_core(args, return_type) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/dispatcher.py", line 106, in _compile_core cres = compiler.compile_extra(self.targetdescr.typing_context, File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 606, in compile_extra return pipeline.compile_extra(func) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 353, in compile_extra return self._compile_bytecode() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode return self._compile_core() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 395, in _compile_core raise e File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler.py", line 386, in _compile_core pm.run(self.state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 339, in run raise patched_exception File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 330, in run self._runPass(idx, pass_inst, state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock return func(*args, **kwargs) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass mutated |= check(pss.run_pass, internal_state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 262, in check mangled = func(compiler_state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 463, in run_pass NativeLowering().run_pass(state) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/typed_passes.py", line 384, in run_pass lower.lower() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 136, in lower self.lower_normal_function(self.fndesc) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 190, in lower_normal_function entry_block_tail = self.lower_function_body() File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 216, in lower_function_body self.lower_block(block)this File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/lowering.py", line 230, in lower_block self.lower_inst(inst) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/contextlib.py", line 131, in exit self.gen.throw(type, value, traceback) File "/home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/numba/core/errors.py", line 751, in new_error_context raise newerr.with_traceback(tb) numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend) Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/umap/layouts.py", line 52: def rdist(x, y):

  result = 0.0
  dim = x.shape[0]
  ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=)" at /home/omnia/Downloads/NanoCLUST/work/conda/read_clustering-800e1e27475cbaa0538f834c4aacc420/lib/python3.8/site-packages/umap/layouts.py (52)

Work dir: /home/omnia/Downloads/NanoCLUST/work/30/ae4788f916916db4d30751000e564a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

RemiMaglione commented 3 years ago

Just tried with test,docker, everything looks fine so far. If conda was causing this error, I'll let this issue open if needed. Best

rfox-mbl commented 2 years ago

I was able to get around this problem by modifying the file conda_envs/read_clustering/environment.yml and changing this line umap-learn=0.4.6 to umap-learn >=0.5.0.

ianvalenca commented 2 years ago

Thank you so much

I was able to get around this problem by modifying the file _conda_envs/readclustering/environment.yml and changing this line umap-learn=0.4.6 to umap-learn >=0.5.0.

thank you so much rfox-mbl! Tha worked for me ! mine was changed to version 0.5.2.

lfurn commented 2 years ago

Hi I have the same error and the above fix didn't work for me sadly. I am running conda profile as I have it set up as a HPC job.