BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline
GNU General Public License v3.0
20 stars 11 forks source link

DEanalysis covariates should not be manadatory #82

Closed smoe closed 3 years ago

smoe commented 3 years ago

Hello,

I ran into

InputFunctionException in line 491 of /usr/libexec/pigx_rnaseq/pigx_rnaseq.py:
Error:
  KeyError: 'covariates'
Wildcards:
  analysis=two-case
Traceback:
  File "/usr/libexec/pigx_rnaseq/pigx_rnaseq.py", line 501, in <lambda>
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 111, in run_jobs
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 402, in run
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 203, in _run
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 131, in _run
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 137, in printjob

because of https://github.com/BIMSBbioinfo/pigx_rnaseq/blob/1375b63be6d39a58388d8bf2adc62997645d81fa/snakefile.py#L502 and wish that the covariates would be allowed to be left unspecified. Otherwise, please stress in your documentation that the covariates lines is required.

Many thanks Steffen

smoe commented 3 years ago

I added a "Covariates:" line to the analysis and restarted. This provoked a missinterpretation of the covariates "None" in the generation of the design formula:


...skipping...
tput/feature_counts/raw_counts/counts_from_SALMON.genes.tsv --colDataFile=/home/moeller/FibrosisArrayExpress/output/colData.tsv --gtfFile=/home/moeller/FibrosisArrayExpress/genomes/Mus_musculus.GRCm39.103.gtf --caseSampleGroups=PR --controlSampleGroups=WT --covariates=None --workdir=/home/moeller/FibrosisArrayExpress/output/report --organism=mmusculus --selfContained=FALSE
setting working directory to  /home/moeller/FibrosisArrayExpress/output/report

processing file: deseqReport.Rmd
^M  |                                                                            ^M  |                                                                      |   0%^M  |                                                                            ^M  |..                                                                    |   2%
   inline R code fragments

^M  |                                                                            ^M  |...                                                                   |   5%
label: unnamed-chunk-1 (with options)
List of 1
 $ echo: logi FALSE

^M  |                                                                            ^M  |.....                                                                 |   7%
  ordinary text without R code

^M  |                                                                            ^M  |.......                                                               |  10%
label: setup (with options)
List of 1
 $ include: logi FALSE

^M  |                                                                            ^M  |.........                                                             |  12%
  ordinary text without R code

^M  |                                                                            ^M  |..........                                                            |  15%
label: printInputSettings
^M  |                                                                            ^M  |............                                                          |  17%
  ordinary text without R code

^M  |                                                                            ^M  |..............                                                        |  20%
label: prepare_inputs_import_GTF
^M  |                                                                            ^M  |...............                                                       |  22%
  ordinary text without R code

^M  |                                                                            ^M  |.................                                                     |  24%
label: run_deseq2
design formula:~ None + AnalysisGroup
Quitting from lines 145-189 (deseqReport.Rmd)
Error in DESeqDataSet(se, design = design, ignoreRank) :
  all variables in design formula must be columns in colData
Calls: runReport ... withVisible -> eval -> eval -> <Anonymous> -> DESeqDataSet

I am tempted to look into that but would appreciate some guidance. The version I am working with is 0.0.10.

smoe commented 3 years ago

Here is the invocation

/usr/bin/Rscript --vanilla /usr/libexec/pigx_rnaseq/scripts/runDeseqReport.R --logo=/usr/share/pigx_rnaseq/Logo_PiGx.png --prefix='two-case.salmon.genes' --reportFile=/usr/libexec/pigx_rnaseq/scripts/deseqReport.Rmd --countDataFile=/home/moeller/FibrosisArrayExpress/output/feature_counts/raw_counts/counts_from_SALMON.genes.tsv --colDataFile=/home/moeller/FibrosisArrayExpress/output/colData.tsv --gtfFile=/home/moeller/FibrosisArrayExpress/genomes/Mus_musculus.GRCm39.103.gtf --caseSampleGroups='PR' --controlSampleGroups='WT' --covariates='None' --workdir=/home/moeller/FibrosisArrayExpress/output/report --organism='mmusculus' --selfContained=FALSE >> /home/moeller/FibrosisArrayExpress/output/logs/two-case.report.salmon.genes.log 2>&1

with an explicit 'None'. I have hardened this by checking if the covariate is named "None" and then change this to the empty string. You may decide to fix this properly, instead :) This is my respective entry in the setting - and with a look at the organisms I get what I should have written, but you may anyway want to fix that to stay more YAML-like, also for the organism:

DEanalyses:
    two-case:
      case_sample_groups: "PR"
      control_sample_groups: "WT"
      covariates:
borauyar commented 3 years ago

@smoe The settings template yaml file includes by default an empty string for covariates. But, I agree that this should be handled more properly.

smoe commented 3 years ago

Yip. That look at the organism line had helped me. What happened was that the comments seem to be separating the covariates line such that one gets the impression that the latter is optional. So I had deleted the whole block. Then I got a complaint about a missing "covariates" key, and I added that, without access to the initial formatting and the "" value. I'll have another look at the initial wording and likely prepare a PR to harden the wording a bit more. Many thanks!

borauyar commented 3 years ago

Thank you for the PR! It should improve the user experience. Nevertheless, we should handle this one better in case it is missing.