bigbio / quantms

Quantitative mass spectrometry workflow.
MIT License
24 stars 10 forks source link

SDRF Parsing Error #376

Open jackrogan opened 1 month ago

jackrogan commented 1 month ago

Description of the bug

Hi,

I'm trying to run a minimal experiment to test using the SDRF format to develop a TMT pipeline. I've tried copying the format as best I can, but I don't understand what is being flagged as incorrect here:

Command used and terminal output

Command:

nextflow run bigbio/quantms -r dev -profile docker --input 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv --database /home/jack.rogan/Proteomics/Human_reference_proteome.fasta --add_decoys --search_engines comet --max_precursor_charge 5 --min_peptide_length 7 --FDR_level psm-level-fdrs --max_memory 48.GB --outdir 20240521b_JR_TMT_HS_KO_MIN_comet --acquisition_method dda --labelling_type "tmt10plex" --normalize true --msstats_remove_one_feat_prot false --msstatslfq_removeFewMeasurements false

Output:

ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv)` terminated with an error exit status (1)

Command executed:

  ## -t2 since the one-table format parser is broken in OpenMS2.5
  ## -l for legacy behavior to always add sample columns

  parse_sdrf convert-openms \
      -t2 -l \
      --extension_convert raw:mzML,.gz:,.tar.gz:,.tar:,.zip: \
      -s 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv \
       \
      2>&1 | tee 20240521b_JR_TMT_HS_KO_MIN.sdrf_parsing.log

  mv openms.tsv 20240521b_JR_TMT_HS_KO_MIN.sdrf_config.tsv
  mv experimental_design.tsv 20240521b_JR_TMT_HS_KO_MIN.sdrf_openms_design.tsv

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING":
      sdrf-pipelines: $(parse_sdrf --version 2>&1 | awk -F ' ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  PROCESSING: 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv"
  Factor columns: ['factor value[treatment]']
  Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[sex]', 'characteristics[age]', 'characteristics[developmental stage]', 'characteristics[ethnic group]', 'characteristics[disease]', 'characteristics[cell line]', 'characteristics[cell type]', 'characteristics[infect]', 'characteristics[enrichment process]', 'characteristics[biological replicate]']
  Conditions (5): dict_keys(['OE33_0pc_KO', 'OE33_25pc_KO', 'OE33_50pc_KO', 'OE33_75pc_KO', 'OE33_100pc_KO'])
  Files per condition: dict_values([1, 1, 1, 1, 1])
  Traceback (most recent call last):
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 62, in openms_from_sdrf
      OpenMS().openms_convert(sdrf, onetable, legacy, verbose, conditionsfromcolumns, extension_convert)
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 446, in openms_convert
      self.writeTwoTableExperimentalDesign(
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 617, in writeTwoTableExperimentalDesign
      label = str(choice[label[label_index[raw]]])
                  ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  KeyError: 'TMT127N'

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/usr/local/bin/parse_sdrf", line 10, in <module>
      sys.exit(main())
               ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 239, in main
      cli()
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 65, in openms_from_sdrf
      raise ValueError(msg) from ex
  ValueError: Error: 'TMT127N'

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  PROCESSING: 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv"
  Factor columns: ['factor value[treatment]']
  Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[sex]', 'characteristics[age]', 'characteristics[developmental stage]', 'characteristics[ethnic group]', 'characteristics[disease]', 'characteristics[cell line]', 'characteristics[cell type]', 'characteristics[infect]', 'characteristics[enrichment process]', 'characteristics[biological replicate]']
  Conditions (5): dict_keys(['OE33_0pc_KO', 'OE33_25pc_KO', 'OE33_50pc_KO', 'OE33_75pc_KO', 'OE33_100pc_KO'])
  Files per condition: dict_values([1, 1, 1, 1, 1])
  Traceback (most recent call last):
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 62, in openms_from_sdrf
      OpenMS().openms_convert(sdrf, onetable, legacy, verbose, conditionsfromcolumns, extension_convert)
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 446, in openms_convert
      self.writeTwoTableExperimentalDesign(
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 617, in writeTwoTableExperimentalDesign
      label = str(choice[label[label_index[raw]]])
                  ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  KeyError: 'TMT127N'

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/usr/local/bin/parse_sdrf", line 10, in <module>
      sys.exit(main())
               ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 239, in main
      cli()
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 65, in openms_from_sdrf
      raise ValueError(msg) from ex
  ValueError: Error: 'TMT127N'

Work dir:
  /mnt/bigdata/Jack/20240521b_JR_TMT_HS_KO_MIN/work/89/1cc0e0fc3d4400aff9683b5ca7a053

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv.txt

System information

Nextflow 24.04.1 Docker Ubuntu bigbio/quantms dev

daichengxin commented 1 month ago

Thanks for testing. This is a bug caused by incomplete label. We will fix this bug to flexibly allow. https://github.com/bigbio/sdrf-pipelines/blob/fe1851e0377a0aefb4434da6904b9187f651c3ac/sdrf_pipelines/openms/openms.py#L849-L873

Fix the logic to directly index label. And flexibly allow incomplete label in sdrf.