epi2me-labs / wf-single-cell

Other
75 stars 39 forks source link

Error in pipeline:process_bams:process_matrix #102

Closed HenriettaHolze closed 6 months ago

HenriettaHolze commented 6 months ago

Operating System

CentOS 7

Other Linux

No response

Workflow Version

wf-single-cell v2.0.2-ge9dac45 + bugfix

Workflow Execution

EPI2ME Desktop (Local)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-single-cell -profile singularity --expected_cells 50000 --fastq '/pipeline/Runs/Nanopore/20240514_1009_MN22007_FAY30355_68a76471/no_sample/20240514_1009_MN22007_FAY30355_68a76471/fastq_pass/FAY30355_pass_68a76471_936db429_24.fastq.gz' --kit_name '5prime' --kit_version 'v1' --ref_genome_dir '/data/reference/dawson_labs/genomes/cellranger_reference_GRCh38-2020-A/refdata-gex-GRCh38-2020-A' -w '/scratch/teams/dawson_genomics/Projects/PRC2_BE_screen/results/MF01_nanopore/epi2me_output/work/' --out_dir '/scratch/teams/dawson_genomics/Projects/PRC2_BE_screen/results/MF01_nanopore/epi2me_output/' --threads 16

Workflow Execution - CLI Execution Profile

singularity

What happened?

Hi, I got an error when running the pipeline on a test fastq file in the pipeline:process_bams:process_matrix process, related to following line: https://github.com/epi2me-labs/wf-single-cell/blob/e9dac45261530195a4efbf64ed1437b703ce86b4/bin/workflow_glue/expression_matrix.py#L312 I don't fully understand the logic of the code but seems like there should be a check for len(feat_mask) != 0 or something the like.

I applied the change suggested here https://github.com/epi2me-labs/wf-single-cell/issues/100#issuecomment-2118984118 to run the pipeline.

Relevant log output

[9b/36da84] process > fastcat (1)                                       [100%] 1 of 1 ✔
[bb/5ea0f5] process > parse_kit_metadata (1)                            [100%] 1 of 1 ✔
[85/489119] process > pipeline:getVersions                              [100%] 1 of 1, cached: 1 ✔
[b4/0bf88e] process > pipeline:getParams                                [100%] 1 of 1, cached: 1 ✔
[db/2c41a4] process > pipeline:preprocess:call_paftools                 [100%] 1 of 1, cached: 1 ✔
[ef/60dd24] process > pipeline:preprocess:get_chrom_sizes               [100%] 1 of 1, cached: 1 ✔
[37/0c81f1] process > pipeline:preprocess:build_minimap_index           [100%] 1 of 1, cached: 1 ✔
[a8/9d20ca] process > pipeline:preprocess:call_adapter_scan (1)         [100%] 1 of 1 ✔
[75/cde6b4] process > pipeline:process_bams:split_gtf_by_chroms         [100%] 1 of 1, cached: 1 ✔
[a9/d91f2d] process > pipeline:process_bams:generate_whitelist (1)      [100%] 1 of 1 ✔
[29/45a389] process > pipeline:process_bams:assign_barcodes (1)         [100%] 1 of 1 ✔
[e3/7a9b50] process > pipeline:process_bams:cat_tags_by_chrom (1)       [100%] 1 of 1 ✔                                   [ce/32b36f] process > pipeline:process_bams:merge_bams (1)              [100%] 1 of 1 ✔
[66/bec883] process > pipeline:process_bams:stringtie (40)              [100%] 40 of 40 ✔
[2b/582eb6] process > pipeline:process_bams:align_to_transcriptome (40) [100%] 40 of 40 ✔                                 [2c/7d60f8] process > pipeline:process_bams:assign_features (17)        [100%] 20 of 20 ✔
[4b/2ffc69] process > pipeline:process_bams:create_matrix (20)          [100%] 20 of 20 ✔
[4a/d41d58] process > pipeline:process_bams:process_matrix (2)          [ 50%] 1 of 2, failed: 1
[69/d1ec26] process > pipeline:process_bams:merge_transcriptome (1)     [100%] 1 of 1 ✔
[7c/069f4a] process > pipeline:process_bams:combine_final_tag_files (1) [100%] 1 of 1 ✔
[08/355d7c] process > pipeline:process_bams:tag_bam (1)                 [100%] 1 of 1 ✔                                   [-        ] process > pipeline:process_bams:umi_gene_saturation         [  0%] 0 of 1
[-        ] process > pipeline:process_bams:pack_images                 -
[-        ] process > pipeline:prepare_report_data                      -
[-        ] process > pipeline:makeReport                               -
ERROR ~ Error executing process > 'pipeline:process_bams:process_matrix (2)'

Caused by:
  Process `pipeline:process_bams:process_matrix (2)` terminated with an error exit status (1)

Command executed:

  export NUMBA_NUM_THREADS=1
  workflow-glue process_matrix         inputs/matrix*.hdf         --feature gene         --raw gene_raw_feature_bc_matrix         --processed gene_processed_feature_bc_matrix         --per_cell_mito gene.expression.mito-per-cell.tsv         --per_cell_expr gene.expression.mean-per-cell.tsv         --umap_tsv gene.expression.umap.tsv         --enable_filtering         --min_features 200         --min_cells 3         --max_mito 20         --mito_prefixes MT-         --norm_count 10000         --enable_umap         --replicates 3

Command exit status:
  1

Command output:
  (empty)

Command error:
  [12:50:00 - workflow_glue] Bootstrapping CLI.
  /home/epi2melabs/conda/lib/python3.8/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
    @numba.jit()
  /home/epi2melabs/conda/lib/python3.8/site-packages/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
    @numba.jit()
  /home/epi2melabs/conda/lib/python3.8/site-packages/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
    @numba.jit()
  /home/epi2melabs/conda/lib/python3.8/site-packages/umap/umap_.py:660: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
    @numba.jit()
  [12:50:16 - workflow_glue] Starting entrypoint.
  [12:50:16 - workflow_glue.AggreMatri] Constructing count matrices
  [12:50:16 - workflow_glue.AggreMatri] Removing unknown features.
  [12:50:16 - workflow_glue.AggreMatri] Writing raw counts to file.
  [12:50:16 - workflow_glue.AggreMatri] Filtering, normalizing and log-transforming matrix.
  Traceback (most recent call last):
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow-glue", line 7, in <module>
      cli()
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/__init__.py", line 82, in cli
      args.func(args)
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/process_matrix.py", line 115, in main
      matrix
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/expression_matrix.py", line 238, in remove_cells_and_features
      self._remove_elements(feat_mask=feat_mask, cell_mask=cell_mask)
    File "/home/hholze/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/expression_matrix.py", line 312, in _remove_elements
      self._matrix = self._matrix[:i+1]
  UnboundLocalError: local variable 'i' referenced before assignment

Work dir:
  /scratch/teams/dawson_genomics/Projects/PRC2_BE_screen/results/MF01_nanopore/epi2me_output/work/4a/d41d587ba5f184ea03c8c37fff44c3

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (1)

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

cjw85 commented 6 months ago

Hi,

When you say:

I applied the change suggested here https://github.com/epi2me-labs/wf-single-cell/issues/100#issuecomment-2118984118 to run the pipeline.

Do you mean that having applied this change you then hit the new error reported here? I think I would expect that because both errors are fundamentally related to the same underlying issue.

cjw85 commented 6 months ago

The immediate cause in the current case is that the code has selected all features (i.e. genes in this case) for removal as they do not pass the filters set. The filters include absolute values on the numbers of cells in which the feature appears.

I presume that your test data is a small dataset and so does not contain sufficient data to pass these defaults. When testing with small datasets we set the parameters --matrix_min_genes 1 --matrix_min_cells 1 in order to, in effect, bypass the filtering.

HenriettaHolze commented 6 months ago

Yes, I applied the change and then hit this error. Setting those parameters worked, thanks!

cjw85 commented 6 months ago

I'm reopening this issue just to remind us to make the code throw a more helpful error message when all cells or features have been filtered away.

HenriettaHolze commented 6 months ago

I got a similar error (j referenced before assignment) even when running on the whole dataset. I had to add the parameters again to make it work.

cjw85 commented 6 months ago

v2.0.3 will be available shortly and will provide a more reasonable error message.