bigbio / sdrf-pipelines

A repository to convert SDRF proteomics files into pipelines config files
Apache License 2.0
16 stars 21 forks source link

Problems understanding TMT label in sdrf file #152

Open MetteBoge opened 9 months ago

MetteBoge commented 9 months ago

Hi ! I have made a sdrf file to use as input in the nf-core/quantms pipeline (v.1.2.0). But I encounter an error related to the tmt labels (or so I think). I have upon using the file tested it using parse_sdrf validate-sdrf --sdrf_file PDC000270.sdrf.tsv. The file passes without error.

But when I use it in quantms, I get this error:

-[nf-core/quantms] Pipeline completed with errors-
WARN: There's no process matching config selector: MULTIQC -- Did you mean: PMULTIQC?
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (PDC000270.sdrf.tsv)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (PDC000270.sdrf.tsv)` terminated with an error exit status (1)

Command executed:
[PDC000270.sdrf.tsv.txt](https://github.com/bigbio/sdrf-pipelines/files/13526742/PDC000270.sdrf.tsv.txt)

  ## -t2 since the one-table format parser is broken in OpenMS2.5
  ## -l for legacy behavior to always add sample columns

  parse_sdrf convert-openms \
      -t2 -l \
      --extension_convert raw:mzML,.gz:,.tar.gz:,.tar:,.zip: \
      -s PDC000270.sdrf.tsv \
       \
      2>&1 | tee PDC000270.sdrf_parsing.log

  mv openms.tsv PDC000270.sdrf_config.tsv
  mv experimental_design.tsv PDC000270.sdrf_openms_design.tsv

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING":
      sdrf-pipelines: $(parse_sdrf --version 2>&1 | awk -F ' ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  PROCESSING: PDC000270.sdrf.tsv"
  Factor columns: ['factor value[disease]']
  Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[developmental stage]', 'characteristics[disease]', 'characteristics[cell type]', 'characteristics[biological replicate]', 'characteristics[sex]', 'characteristics[age]', 'characteristics[ancestry category]', 'characteristics[individual]']
  Conditions (2): dict_keys(['Primary Tumor', 'Solid Tissue Normal'])
  Files per condition: dict_values([2475, 1025])
  Traceback (most recent call last):
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/parse_sdrf.py", line 62, in openms_from_sdrf
      OpenMS().openms_convert(sdrf, onetable, legacy, verbose, conditionsfromcolumns, extension_convert)
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/openms/openms.py", line 444, in openms_convert
      self.writeTwoTableExperimentalDesign(
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/openms/openms.py", line 615, in writeTwoTableExperimentalDesign
      label = str(choice[label[label_index[raw]]])
                  ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  KeyError: 'TMT131'

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/usr/local/bin/parse_sdrf", line 10, in <module>
      sys.exit(main())
               ^^^^^^
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/parse_sdrf.py", line 239, in main
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/parse_sdrf.py", line 65, in openms_from_sdrf
      raise ValueError(msg) from ex
  ValueError: Error: 'TMT131'

Command error:
  PROCESSING: PDC000270.sdrf.tsv"
  Factor columns: ['factor value[disease]']
  Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[developmental stage]', 'characteristics[disease]', 'characteristics[cell type]', 'characteristics[biological replicate]', 'characteristics[sex]', 'characteristics[age]', 'characteristics[ancestry category]', 'characteristics[individual]']
  Conditions (2): dict_keys(['Primary Tumor', 'Solid Tissue Normal'])
  Files per condition: dict_values([2475, 1025])
  Traceback (most recent call last):
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/parse_sdrf.py", line 62, in openms_from_sdrf
      OpenMS().openms_convert(sdrf, onetable, legacy, verbose, conditionsfromcolumns, extension_convert)
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/openms/openms.py", line 444, in openms_convert
      self.writeTwoTableExperimentalDesign(
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/openms/openms.py", line 615, in writeTwoTableExperimentalDesign
      label = str(choice[label[label_index[raw]]])
                  ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  KeyError: 'TMT131'

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/usr/local/bin/parse_sdrf", line 10, in <module>
      sys.exit(main())
               ^^^^^^
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/parse_sdrf.py", line 239, in main
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/sdrf_pipelines/parse_sdrf.py", line 65, in openms_from_sdrf
      raise ValueError(msg) from ex
  ValueError: Error: 'TMT131'

Work dir:
  /work/mbp/massspec_tumorantigens/other_studies/sdrf/PDC000270/work/07/c6f8360cade1a761f1b0642b764558

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

I have the following labels in the "comment[label]" column:

cut -f 17 PDC000270.sdrf.tsv | sort | uniq TMT127C TMT127N TMT128C TMT128N TMT129C TMT129N TMT130C TMT130N TMT131 TMT131C comment[label]

I attach the sdrf file, so you can inspect it if needed.

Thanks in advance!

P.s. The uploadet file has .txt extension for me to be able to attach it here. It does not have this extension when I use it on the server. PDC000270.sdrf.tsv.txt