Closed Blosberg closed 4 years ago
Note that the rule calls generateReports
, which does use shell
in the end.
yes, but at the level of job submission, --printshellcmds
will not propagate into the generateReports
function. If we're using a shell command anyway, then I don't see any reason why we can't just have this directly in the rule. What is the point of the dumps
variable?
More generally though, the problem is just that it makes things non-standard. If all of our other output is using shell
to produce output, then we should avoid changing that for just one specific rule, unless we really need to. Doing things the same way every time just keeps everything cleaner. irregularities are messy and invariably lead to problems like this.
resolved by commit https://github.com/BIMSBbioinfo/pigx_bsseq/commit/f7508cb0b5350b79652d310f5b15cb03f6cdd09c
Final report generation shell command is now sent to the snakemake log file.
More generally though, the problem is just that it makes things non-standard. If all of our other output is using
shell
to produce output, then we should avoid changing that for just one specific rule, unless we really need to. Doing things the same way every time just keeps everything cleaner. irregularities are messy and invariably lead to problems like this.
I disagree. There's nothing wrong with using whatever mechanism there is if it allows us to solve problems. We ended up using run
for a reason. Calling it an irregularity makes it sound like there is no reason to do it this way in the first place. git blame
may help you answer this question.
(Besides, if anything is messy it's using shell
at all ;))
makes it sound like there is no reason to do it this way in the first place.
That was the main point of this question: why exactly it was being done this way. Git blame on the file didn't provide any illuminating answer on this --perhaps I missed something. Can you explain why this is run
stuff is necessary? ( or post a more specific git blame
command that would provide that answer)
I think this solution of printing into stdout is very nice, only nice thing would be to make it conditional ( only when printShellCmd).
About the why are we using it?
:
I think run
is used to execute python code, which was needed in this case, because we built the command using python. Especially this dump:q
was to place the json formatted params dict into the command.
The benefit of this functionality is that we do not have to explicitly have to hardcode the params of the two reports (final, diffmeth), but can define them in the rule and pass them as dict.
The benefit of this functionality is that we do not have to explicitly have to hardcode the params of the two reports (final, diffmeth), but can define them in the rule and pass them as dict.
ok, I'm a little foggy on some of the details in there, but that sounds like a sufficiently good reason for setting it up in this way.
only nice thing would be to make it conditional ( only when printShellCmd).
Agreed, that would be ideal, but passing the flag into the function is an extra step of work. Moreover, this output isn't going to screen; I just tested it and saw that this it is actually sent to the snakemake output log file for the job in question (which is, itself, going to be relegated to a sub-folder of pigx_work/
), so it's not really cluttering up any output. The only time a user will see the output of this print statement is if they are specifically looking into the log files (in which case, it would usually make sense to see the command), so I'm ok with just leaving it as is.
Unfortunately, a simple print statement doesn't actually convey all the necessary information. This is what the output looks like currently:
==== The present shell command being submitted by the generateReport function is as follows:
nice -19 /home/bosberg/projects/pigx/pigx_bsseq_production/guix/.guix-profile/bin/Rscript --vanilla {DIR_scripts}/generate_report.R --scriptsDir=/home/bosberg/projects/pigx/pigx_bsseq_production/scripts/ --reportFile={input.template} --outFile={output.report} --finalReportDir=/scratch/AG_Akalin/bosberg/cfDNA_cardiac_pigx_output/Final_Reports/ --report.params={dumps:q} --logFile={log}
==== End of shell command submitted by the generateReport function.
And then when the job crashes, the only additional information comes from the SM run:
Error in rule final_report:
jobid: 121
output: /scratch/AG_Akalin/bosberg/cfDNA_cardiac_pigx_output/Final_Reports/N2_RDML00564_HT2LKCCXY_L1_1_val_1_bt2.sorted.deduped_hg19_final.html
log: /scratch/AG_Akalin/bosberg/cfDNA_cardiac_pigx_output/Final_Reports/N2_RDML00564_HT2LKCCXY_L1_1_val_1_bt2.sorted.deduped_hg19_final.log
cluster_jobid: Your job 3166421 ("snakejob.final_report.121.sh") has been submitted
So at least we know what the output is, but it's still a massive pain to figure out what the inputs and all the parameters were in order to run the job manually in an R
session.
Especially this dump:q was to place the json formatted params dict into the command.
Why do we need to dump this all in json format? We used to just include the parameters directly in the shell command.
The dumping was because:
The benefit of this functionality is that we do not have to explicitly have to hardcode the params of the two reports (final, diffmeth), but can define them in the rule and pass them as dict.
If we have so many issues with the generate_report() function we might aswell trash it and write explicit shell commands inside the rule.
@rekado @katwre @Blosberg what do you think?
@alexg9010 I don't mind any, using explicit shell commands or run
, if there are more advantages of using shell lets go for it. I remember that for final and diffmeth it was a bit tricky with the code and it was the only way to go (passing parameters) and so we had to use run
.
Print ArgsL rds file to pigx_work, and then make a note of that fact in the SM message.
should be fixed with https://github.com/BIMSBbioinfo/pigx_bsseq/pull/159
The final report rule in BS seq uses
run
instead ofshell
, which means we cannot see what's happening inside snakemake with the optionprintshellcmds
.This enormously increases the difficulty of debugging when issues are encountered (e.g. Issue #119).
We need to either change "run" to "shell" in this rule, or find some other way to output the command.