Open eudesbarbosa opened 2 years ago
For 'Unify output filenames for Control Freec and CopyWriter':
one should also consider unifying cnvetti_on_target_postprocess
output. Currently using _
to separate the file extension:
output/bwa.cnvetti_on_target_postprocess.P00{i}-T{t}-DNA1-WGS1/out/bwa.cnvetti_on_target_postprocess.P00{i}-T{t}-DNA1-WGS1_{ext}
Line 290 - method CnvettiStepPartBase._get_output_files_postprocess()
I don't understand the changes to snappy_wrappers/wrappers/control_freec/transform/snappy-convert-control_freec.R
:
-> They seem to be the reverse of the changes to the workflow.
-- ratios_fn=paste("freec.", sample_name, ".ratio.txt", sep=""),
-- log2_fn=paste("freec.", sample_name, ".log2.txt", sep=""),
-- call_fn=paste("freec.", sample_name, ".call.txt", sep=""),
-- segments_fn=paste("freec.", sample_name, ".segments.txt", sep=""),
+- ratios_fn=paste("freec.", sample_name, "_ratio.txt", sep=""),
+- log2_fn=paste("freec.", sample_name, "_gene_log2.txt", sep=""),
+- call_fn=paste("freec.", sample_name, "_gene_call.txt", sep=""),
+- segments_fn=paste("freec.", sample_name, "_segments.txt", sep=""),
For the function postProcess
in snappy_wrappers/wrappers/copywriter/call/snappy-copywriter-call.R
:
-> Can't the outputs be an argument, similar to control_freec_write_files
, so they don't need to be agnostic towards the rest of the code?
Example:
-- paste( mapper, ".copywriter.", fullID, "_segments.txt", sep="" ) ),
+- paste( mapper, ".copywriter.", fullID, ".segments.txt", sep="" ) ),
Another points about control_freec_write_files
, it seems to have default values but the code will stop if files are not accessible. Are defaults really necessary? Can't they just be defined/taken from the wrapper?
I am afraid I have created a monster here. I should have stuck to a naming convention for all CNV steps:
copywriter
in somatic_targeted_seq_cnv_calling
,CNVkit
in somatic_targeted_seq_cnv_calling
& somatic_wgs_cnv_calling
,Control-FREEC
in somatic_wgs_cnv_calling
cnvetti
(germline)You are welcome to unify the naming convention.
On a side note, I found that output filenames are configured differently in different places,even within the same step & tool (for example names are constructed with underscore and with dot).
I don't know which is the right way to go. Ideally, I would like to:
somatic_targeted_seq_cnv_calling
with the somatic_wgs_cnv_calling
(there is no logical need to keep them apart, as the WES & WGS somatic snvs & indel calling are grouped in a single step)This is probably a lot of unnecessary work, but perhaps we should consider moving to dot separation across the somatic CNV steps.
Regarding the control_freec_write_files
, I assume you are referring to the Bioconductor packages org.Hs.eg.db
, TxDb.Hsapiens.UCSC.hg19.knownGene
& BSgenome.Hsapiens.1000genomes.hs37d5
.
This is a real problem, because these packages are huge (genome sequence, genome feature annotations, gene ids & functional annotations), and they are only valid for one genome release. If we want to use GRCh38, or mouse data, then we need to use other version of these packages.
The problem comes as the packages are downloaded using the wrapper's conda environment. So the only solution (at the moment) is to put in the conda environment annotation packages for several genome releases and species.
We discussed with Clemens a way around this problem. We would have an initial sub-step which creates a sub-directory in $PWD/work
in which required packages are installed. This would then be used as supplementary library path for the R scripts, provided by the wrapper. I have a pretty good idea how we could do that, but I need a few days to work on it.
Name pattern: Probably a good idea to stick to one across all workflows. I think I went over all the ones present in the original PR, but it might be something to gradually fix.
Workflow distinction: I suggest we keep the separation. Too much work for very little value.
control_freec_write_files: Way beyond my knowledge of the topic. I was just referring to the fact that there seems to be some default values for the output arguments - in this case you needed to modify them, you might consider have no defaults and avoid that:
-- ratios_fn=paste("freec.", sample_name, ".ratio.txt", sep=""),
+- ratios_fn=paste("freec.", sample_name, "_ratio.txt", sep=""),
Sorry I misunderstood your comment. You are absolutely right: ratios_fn
should be obtained from the wrapper.
Tasks
Tasks extracted from PR #94 :
Based on commit 0a30513f360b82828efeed6835f362df999f776a
Based on commit e3be244584549cb2f623e4f4e0112b1643376fc0
mutect2
.Based on commit 1a451614bf10c840f2e5566463078c7cb40f16c0
eb_filter
.jannovar
wrapper.Based on commit c2594449df3f9ecff92a04e0b6f975d061cf38ce
snappy_pipeline/workflows/somatic_variant_calling/__init__.py
-mutect2
related.Based on commit 18b83627a204e8ec077dd93d4d60a16e4158c91f
snappy_wrappers/wrappers/mutect2/contamination/wrapper.py
snappy_wrappers/wrappers/mutect2/filter/wrapper.py
Based on commit c46917387b3552c08113b6fab9e88e88c18c6ebfUpdatecbioportal_export
default config.Fix wrappersnappy_wrappers/wrappers/cbioportal/case_lists/wrapper.py
.Fix wrappersnappy_wrappers/wrappers/cbioportal/meta_files/wrapper.py
.Fix wrappersnappy_wrappers/wrappers/vcf2maf/vcf2maf/wrapper.py
.Based on commit d407f2075ffe387988ae2e72d6b27b23e3746606
Based on commit f062fc5fca06e846eb6189b87476d21c2928ae36