There's a reasonably common bit of boilerplate that comes up when composing tools - declaring inputs for all the reference-like things, that are the same as those in one or more of the tools invoked, then threading them in.
For example:
...
self.input(
"snps_dbsnp",
VcfTabix,
doc=InputDocumentation(
"From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``",
quality=InputQualityType.static,
example="HG38: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/\n\n"
"(WARNING: The file available from the genomics-public-data resource on Google Cloud Storage is NOT compressed and indexed. This will need to be completed prior to starting the pipeline.\n\n"
"File: gs://genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.gz",
),
)
self.input(
"snps_1000gp",
VcfTabix,
doc=InputDocumentation(
"From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``",
quality=InputQualityType.static,
example="HG38: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/\n\n"
"File: gs://genomics-public-data/references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz",
),
)
...
self.step(
"vc_gatk",
GatkSomaticVariantCaller_4_1_3(
normal_bam=self.normal_bam,
tumor_bam=self.tumor_bam,
normal_name=self.normal_name,
tumor_name=self.tumor_name,
intervals=self.gatk_intervals,
reference=self.reference,
snps_dbsnp=self.snps_dbsnp,
snps_1000gp=self.snps_1000gp,
known_indels=self.known_indels,
mills_indels=self.mills_indels,
),
scatter="intervals",
)
This leads to duplication, and room for error.
One possibility would be to use a static method to add groups of inputs. So for example you might have:
This doesn't help much with the input passing. Half an idea about how to reduce that is to use Python's keyword argument magic. It seems somehow like you should be able to do something like:
Hi Janis,
There's a reasonably common bit of boilerplate that comes up when composing tools - declaring inputs for all the reference-like things, that are the same as those in one or more of the tools invoked, then threading them in.
For example:
This leads to duplication, and room for error.
One possibility would be to use a static method to add groups of inputs. So for example you might have:
This doesn't help much with the input passing. Half an idea about how to reduce that is to use Python's keyword argument magic. It seems somehow like you should be able to do something like:
I don't quite have it figured out, but perhaps a static method on
GatkSomaticVariantCaller
could return the dictionary.To quote Terry Pratchet, speaking of Ly Tin Wheedle "at that point the bar closed."