Error running test at read_clustering

Electrocyte commented 4 years ago

$ nextflow run ~/SequencingData/NanoCLUST/main.nf -profile test,docker
N E X T F L O W  ~  version 20.04.1
Launching `/home/james/SequencingData/NanoCLUST/main.nf` [dreamy_fourier] - revision: 15218921b7
----------------------------------------------------
      _   __                     ________    __  _____________
     / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
    /  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /
   / /|  / /_/ / / / / /_/ /  / /___/ /___/ /_/ /___/ // /
  /_/ |_/\__,_/_/ /_/\____/   \____/_____/\____//____//_/

  NanoCLUST v1.0dev
----------------------------------------------------
Run Name          : dreamy_fourier
Reads             : /home/james/SequencingData/NanoCLUST/test_datasets/mock4_run3bc08_5000.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - [:]
Output dir        : ./results
Launch dir        : /home/james/SequencingData/NanoCLUST/templates
Working dir       : /home/james/SequencingData/NanoCLUST/templates/work
Script dir        : /home/james/SequencingData/NanoCLUST
User              : james
Config Profile    : test,docker
Config Description: Minimal test dataset to check pipeline function
----------------------------------------------------
executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [  0%] 0 of 1
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
[0;35m[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

executor >  local (5)
[5b/f73956] process > QC (1)                   [100%] 1 of 1 ✔
[01/a019f4] process > fastqc (1)               [100%] 1 of 1 ✔
[81/e663de] process > kmer_freqs (1)           [100%] 1 of 1 ✔
[eb/f5325e] process > read_clustering (1)      [100%] 1 of 1, failed: 1 ✘
[-        ] process > split_by_cluster         -
[-        ] process > read_correction          -
[-        ] process > draft_selection          -
[-        ] process > racon_pass               -
[-        ] process > medaka_pass              -
[-        ] process > consensus_classification -
[-        ] process > join_results             -
[-        ] process > get_abundances           -
[-        ] process > plot_abundances          -
[d2/3d9a68] process > output_documentation     [100%] 1 of 1 ✔
[0;35m[nf-core/nanoclust] Pipeline completed with errors
Error executing process > 'read_clustering (1)'

Caused by:
  Process `read_clustering (1)` terminated with an error exit status (1)

Command executed [/home/james/SequencingData/NanoCLUST/templates/umap_hdbscan.py]:

  #!/usr/bin/env python

  import numpy as np
  import umap
  import matplotlib.pyplot as plt
  from sklearn import decomposition
  import random
  import pandas as pd
  import hdbscan

  df = pd.read_csv("freqs.txt", delimiter="     ")

  #UMAP
  motifs = [x for x in df.columns.values if x not in ["read", "length"]]
  X = df.loc[:,motifs]
  X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

  df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
  umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

  #HDBSCAN
  X = umap_out.loc[:,["D1", "D2"]]
  umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)

  #PLOT
  plt.figure(figsize=(20,20))
  plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1)
  plt.xlabel("UMAP1", fontsize=18)
  plt.ylabel("UMAP2", fontsize=18)
  plt.gca().set_aspect('equal', 'datalim')
  plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)

  for cluster in np.sort(umap_out['bin_id'].unique()):
      read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
      plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

  plt.savefig('hdbscan.output.png')
  umap_out.to_csv("hdbscan.output.tsv", sep="/py", index=False)

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File ".command.sh", line 4, in <module>
      import umap
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/__init__.py", line 1, in <module>
      from .umap_ import UMAP
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/umap_.py", line 53, in <module>
      from umap.layouts import (
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/layouts.py", line 39, in <module>
      def rdist(x, y):
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/decorators.py", line 193, in wrapper
      disp.enable_caching()
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/dispatcher.py", line 679, in enable_caching
      self._cache = FunctionCache(self.py_func)
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/caching.py", line 614, in __init__
      self._impl = self._impl_class(py_func)
    File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numba/caching.py", line 348, in __init__
      raise RuntimeError("cannot cache function %r: no locator available "
  RuntimeError: cannot cache function 'rdist': no locator available for file '/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/layouts.py'

Work dir:
  /home/james/SequencingData/NanoCLUST/templates/work/eb/f5325ea476439e9002400a2c0daecb

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I have checked the error RuntimeError: cannot cache function 'rdist': no locator available for file '/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/layouts.py' and found someone with a similar issue here: https://stackoverflow.com/questions/56995232/runtimeerror-cannot-cache-function-jaccard-no-locator-available-for-file

I am not sure this will fix it, please help

Checked some of the previous posted issues, adding the following information: Ubuntu: v18.04.4 LTS (Bionic Beaver) Perl: v5.26.1 Docker: v18.09.7

genomicsITER commented 4 years ago

Hi,

Thank you for your time and details. We've encountered some issues with the UMAP enviroments and runtime errors like yours. At this time, the exact runtime error you are getting with the Docker profile is avoided in our enviroment (built from the same versions as you) when executing the pipeline with sudo. Even if your Docker installation is configured to be used without super user permissions, you will find these kind of errors in the read clustering stage.

We're investigating this issue and characterinzing it to give users proper warning and troubleshooting. Please contact if you still have errors with the pipeline.

Electrocyte commented 4 years ago

How well does the pipeline perform on an average laptop?

genomicsITER / NanoCLUST

Error running test at read_clustering #7