Closed smoe closed 3 years ago
I added a "Covariates:" line to the analysis and restarted. This provoked a missinterpretation of the covariates "None" in the generation of the design formula:
...skipping...
tput/feature_counts/raw_counts/counts_from_SALMON.genes.tsv --colDataFile=/home/moeller/FibrosisArrayExpress/output/colData.tsv --gtfFile=/home/moeller/FibrosisArrayExpress/genomes/Mus_musculus.GRCm39.103.gtf --caseSampleGroups=PR --controlSampleGroups=WT --covariates=None --workdir=/home/moeller/FibrosisArrayExpress/output/report --organism=mmusculus --selfContained=FALSE
setting working directory to /home/moeller/FibrosisArrayExpress/output/report
processing file: deseqReport.Rmd
^M | ^M | | 0%^M | ^M |.. | 2%
inline R code fragments
^M | ^M |... | 5%
label: unnamed-chunk-1 (with options)
List of 1
$ echo: logi FALSE
^M | ^M |..... | 7%
ordinary text without R code
^M | ^M |....... | 10%
label: setup (with options)
List of 1
$ include: logi FALSE
^M | ^M |......... | 12%
ordinary text without R code
^M | ^M |.......... | 15%
label: printInputSettings
^M | ^M |............ | 17%
ordinary text without R code
^M | ^M |.............. | 20%
label: prepare_inputs_import_GTF
^M | ^M |............... | 22%
ordinary text without R code
^M | ^M |................. | 24%
label: run_deseq2
design formula:~ None + AnalysisGroup
Quitting from lines 145-189 (deseqReport.Rmd)
Error in DESeqDataSet(se, design = design, ignoreRank) :
all variables in design formula must be columns in colData
Calls: runReport ... withVisible -> eval -> eval -> <Anonymous> -> DESeqDataSet
I am tempted to look into that but would appreciate some guidance. The version I am working with is 0.0.10.
Here is the invocation
/usr/bin/Rscript --vanilla /usr/libexec/pigx_rnaseq/scripts/runDeseqReport.R --logo=/usr/share/pigx_rnaseq/Logo_PiGx.png --prefix='two-case.salmon.genes' --reportFile=/usr/libexec/pigx_rnaseq/scripts/deseqReport.Rmd --countDataFile=/home/moeller/FibrosisArrayExpress/output/feature_counts/raw_counts/counts_from_SALMON.genes.tsv --colDataFile=/home/moeller/FibrosisArrayExpress/output/colData.tsv --gtfFile=/home/moeller/FibrosisArrayExpress/genomes/Mus_musculus.GRCm39.103.gtf --caseSampleGroups='PR' --controlSampleGroups='WT' --covariates='None' --workdir=/home/moeller/FibrosisArrayExpress/output/report --organism='mmusculus' --selfContained=FALSE >> /home/moeller/FibrosisArrayExpress/output/logs/two-case.report.salmon.genes.log 2>&1
with an explicit 'None'. I have hardened this by checking if the covariate is named "None" and then change this to the empty string. You may decide to fix this properly, instead :) This is my respective entry in the setting - and with a look at the organisms I get what I should have written, but you may anyway want to fix that to stay more YAML-like, also for the organism:
DEanalyses:
two-case:
case_sample_groups: "PR"
control_sample_groups: "WT"
covariates:
@smoe The settings template yaml file includes by default an empty string for covariates. But, I agree that this should be handled more properly.
Yip. That look at the organism line had helped me. What happened was that the comments seem to be separating the covariates line such that one gets the impression that the latter is optional. So I had deleted the whole block. Then I got a complaint about a missing "covariates" key, and I added that, without access to the initial formatting and the "" value. I'll have another look at the initial wording and likely prepare a PR to harden the wording a bit more. Many thanks!
Thank you for the PR! It should improve the user experience. Nevertheless, we should handle this one better in case it is missing.
Hello,
I ran into
because of https://github.com/BIMSBbioinfo/pigx_rnaseq/blob/1375b63be6d39a58388d8bf2adc62997645d81fa/snakefile.py#L502 and wish that the covariates would be allowed to be left unspecified. Otherwise, please stress in your documentation that the covariates lines is required.
Many thanks Steffen