epi2me-labs / wf-single-cell

Other
74 stars 39 forks source link

Problem using --single_cell_sample_sheet #49

Closed ddiez closed 1 year ago

ddiez commented 1 year ago

Operating System

Other Linux (please specify below)

Other Linux

Ubuntu 23.04

Workflow Version

v0.2.7-g9272e2c

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-single-cell \ -w single-cell-demo-out2/workspace \ -profile standard \ --fastq wf-single-cell-demo/fastq/A/chr17a.fq.gz wf-single-cell-demo/fastq/B/chr17b.fq.gz \ --single_cell_sample_sheet samples.txt \ --ref_genome_dir ~/10x/refdata-gex/refdata-gex-GRCh38-2020-A \ --out_dir single-cell-demo-out2 \ --plot_umaps \ --umap_n_repeats 1

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

I am trying to use single_cell_sample_sheet option to pass the information of three different samples. To test this, I am using the demo dataset and pretending I have to samples. I provide the sample information in a samples.txt file that contains the following information:

sample_id,kit_name,kit_version,exp_cells
A,3prime,v3,500
B,3prime,v3,500

When I run the workflow using the code paste above, I get the following error:

3prime is not a supported kit

The full log is included below. It seems the single_cell_sample_sheet file is correctly detected. For some reason the pipeline fails when checking the kit is one of the supported kits. Looking at the code in main.nf it is not clear to me why this might be. So I am wondering if I am using this option correctly or there is a problem in the workflow.

Relevant log output

N E X T F L O W  ~  version 23.04.3
Launching `https://github.com/epi2me-labs/wf-single-cell` [ecstatic_nobel] DSL2 - revision: 9272e2ce6d [master]

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-single-cell v0.2.7-g9272e2c
--------------------------------------------------------------------------------
Core Nextflow options
  revision                : master
  runName                 : ecstatic_nobel
  containerEngine         : docker
  launchDir               : /home/diez/tmp/ont
  workDir                 : /home/diez/tmp/ont/single-cell-demo-out2/workspace
  projectDir              : /home/diez/.nextflow/assets/epi2me-labs/wf-single-cell
  userName                : diez
  profile                 : standard
  configFiles             : /home/diez/.nextflow/assets/epi2me-labs/wf-single-cell/nextflow.config

Input Options
  fastq                   : wf-single-cell-demo/fastq/A/chr17a.fq.gz
  ref_genome_dir          : /home/diez/10x/refdata-gex/refdata-gex-GRCh38-2020-A
  kit_config              : /home/diez/.nextflow/assets/epi2me-labs/wf-single-cell/kit_configs.csv

Sample Options
  single_cell_sample_sheet: samples.txt

Output Options
  out_dir                 : single-cell-demo-out2
  plot_umaps              : true

Advanced options
  matrix_min_genes        : 200
  umap_plot_genes         : /home/diez/.nextflow/assets/epi2me-labs/wf-single-cell/umap_plot_genes.csv
  umap_n_repeats          : 1

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-single-cell for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

--------------------------------------------------------------------------------
This is epi2me-labs/wf-single-cell v0.2.7-g9272e2c.
--------------------------------------------------------------------------------
Checking fastq input.
[-        ] process > fastcat                                             -
[-        ] process > move_or_compress                                    -
[-        ] process > pipeline:getVersions                                -
[-        ] process > pipeline:getParams                                  -
[-        ] process > pipeline:summariseCatChunkReads                     -
[-        ] process > pipeline:stranding:call_adapter_scan                -
[-        ] process > fastcat                                             -
[-        ] process > move_or_compress                                    [  0%] 0 of 1
[-        ] process > pipeline:getVersions                                [  0%] 0 of 1
[-        ] process > pipeline:getParams                                  [  0%] 0 of 1
[-        ] process > pipeline:summariseCatChunkReads                     -
[-        ] process > pipeline:stranding:call_adapter_scan                -
[-        ] process > pipeline:stranding:combine_adapter_tables           -
[-        ] process > pipeline:stranding:summarize_adapter_table          -
[-        ] process > pipeline:align:call_paftools                        [  0%] 0 of 1
[-        ] process > pipeline:align:get_chrom_sizes                      [  0%] 0 of 1
[-        ] process > pipeline:align:align_to_ref                         -
[-        ] process > fastcat                                             -
[-        ] process > move_or_compress                                    [  0%] 0 of 1
[-        ] process > pipeline:getVersions                                [  0%] 0 of 1
[-        ] process > pipeline:getParams                                  [  0%] 0 of 1
[-        ] process > pipeline:summariseCatChunkReads                     -
[-        ] process > pipeline:stranding:call_adapter_scan                -
[-        ] process > pipeline:stranding:combine_adapter_tables           -
[-        ] process > pipeline:stranding:summarize_adapter_table          -
[-        ] process > pipeline:align:call_paftools                        [  0%] 0 of 1
[-        ] process > pipeline:align:get_chrom_sizes                      [  0%] 0 of 1
[-        ] process > pipeline:align:align_to_ref                         -
[-        ] process > pipeline:process_bams:split_gtf_by_chroms           [  0%] 0 of 1
[-        ] process > pipeline:process_bams:get_contigs                   -
[-        ] process > pipeline:process_bams:extract_barcodes              -
[-        ] process > pipeline:process_bams:combine_uncorrect_bcs         -
[-        ] process > pipeline:process_bams:generate_whitelist            -
[-        ] process > pipeline:process_bams:assign_barcodes               -
[-        ] process > pipeline:process_bams:stringtie                     -
[-        ] process > pipeline:process_bams:align_to_transcriptome        -
[-        ] process > pipeline:process_bams:assign_features               -
[-        ] process > pipeline:process_bams:cluster_umis                  -
[-        ] process > pipeline:process_bams:tag_bams                      -
[-        ] process > pipeline:process_bams:combine_tag_files             -
[-        ] process > pipeline:process_bams:combine_final_tag_files       -
[-        ] process > pipeline:process_bams:umi_gene_saturation           -
[-        ] process > pipeline:process_bams:construct_expression_matrix   -
[-        ] process > pipeline:process_bams:process_expression_matrix     -
[-        ] process > pipeline:process_bams:umap_reduce_expression_matrix -
[-        ] process > pipeline:process_bams:pack_images                   -
[-        ] process > pipeline:prepare_report_data                        -
[-        ] process > pipeline:makeReport                                 -
[-        ] process > output                                              -
[-        ] process > output_report                                       -
3prime is not a supported kit

Application activity log entry

No response

nrhorner commented 1 year ago

Hi @ddiez

Sorry, the workflow is giving you an incorrect error message.

I think the issue is because the sample_ids do no match between the single_cell_sample_sheet and those that are determined by the input data.

doing --fastq wf-single-cell-demo/fastq/A/chr17a.fq.gz means the input from the A folder will have a sample_id of chr17 .

But if you do --fastq wf-single-cell-demo/fastq/A that should set the sample_id to 'A' the same as in your sample_sheet.

you might want to put your sample data in subdirectories with each one named with the sample_id

fastq
├── A
│   └── reads.fq
└── B
    └── reads.fq
ddiez commented 1 year ago

Thanks @nrhorner for the quick reply and sorry for not providing complete information. I had indeed placed the files in folders following your suggested structure:

$ls wf-single-cell-demo/fastq/*
wf-single-cell-demo/fastq/A:
chr17a.fq.gz

wf-single-cell-demo/fastq/B:
chr17b.fq.gz
ddiez commented 1 year ago

Also just tried without success different iterations of changing the arguments in --fastq:

# Original
--fastq wf-single-cell-demo/fastq/A/chr17a.fq.gz wf-single-cell-demo/fastq/B/chr17b.fq.gz

# Using same name for files as in your example (and renaming the files)
--fastq wf-single-cell-demo/fastq/A/chr17.fq.gz wf-single-cell-demo/fastq/B/chr17.fq.gz

# Passing just the folder with the subfolders
--fastq wf-single-cell-demo/fastq

# Passing the subfolders
--fastq wf-single-cell-demo/fastq/A wf-single-cell-demo/fastq/B

All these lead to the same error about the unsupported kit.

nrhorner commented 1 year ago

Hi @ddiez

I'll get a fix out for ASAP. In the meantime, could you just try running one sample at a time please and pass in the sample parameters on the command line --kit_name --kit_version and --expected_cells. Sorry for the inconvinience.

ddiez commented 1 year ago

Thanks @nrhorner! FYI, I had already run before an individual sample in the way you suggest, and although I had to fix some problem with the amount of memory available for the container, everything went fine. So, there is always that option.

nrhorner commented 1 year ago

Hi @ddiez that's good that you can at least run a single sample. I have a fix for the sample sheet issue and that will be released shortly/

ddiez commented 1 year ago

Thanks for the update!

nrhorner commented 1 year ago

Hi @ddiez I just wanted to let you know that the we haven't forgot about this. I'm just waiting on one more thing before I can release the changes.

nrhorner commented 1 year ago

Hi @ddiez Sorry that this took so long, but there is a fix on our prerelease branch that should hopefully solve your sample sheet issues. It would be great if you're able to test it out.

nextflow run epi2me-labs/wf-single-cell -r prerelease ...

ddiez commented 1 year ago

@nrhorner Thanks for the heads up. I checked with the example dataset set up as described above and it works. I will try with a real dataset soon although I imagine there won't be any problems. Thanks!

nrhorner commented 1 year ago

Thanks for getting back to me @ddiez. These changes will be released today in v0.3.0. I'll close this ticket now, but please let me know if you encounter any more issues.