Open matthdsm opened 4 years ago
You can add them to the registry in the exact same way as a command tool (by declaring it and importing it in the provider’s init. There’s an example here: https://github.com/PMCC-BioinformaticsCore/janis-bioinformatics/blob/master/janis_bioinformatics/tools/common/bwaaligner.py
You can use them interchangeably with a command tool.
Here’s an example where subworkflows are used, and also another sub workflow is declared in the method “ self.process_subpipeline”:
Same if you use a workflow builder:
subwf = WorkflowBuilder(...)
# build subwf here
wf = WorkflowBuilder(...)
wf.step(“subWfStepId”, subwf(**inputMap))
wf.output(‘outFromSubWf”, source=wf. subWfStepId.nameOfOutput)
I’ll leave this open as I still need to document it.
Great! Thanks for the quick reply!
Cheers M
No worries! Keep feeling free to raise issues on here, very happy to answer them!
It’s actually amazing that planes have WIFI.
Talk about "over the air" updates 😉
Unrelated question:
say I have an array of FastqGz
from and I need to create a sample map thats consumable as a list of files (fofn
in gatk terms)
Practically, I need something along the lines of
bcl2fastq -> Array(FastqGz)
-> "Unknown method"
-> FastqGzPair + sampleName: String()
-> Gatk4FastqToSamLatest.fastqR1, Gatk4FastqToSamLatest.fastqR2
I'm thinking about creating a python tool that parses the list of FastqGz
to an object formatted as
{
samplename: {
"R1": samplename_R1.fastq.gz,
"R2": samplename_R2.fastq.gz
},
...
}
but I'm unsure on how to correctly implement this as something that'll make sense in janis.
Any idea's? Advice?
Thanks already. Cheers M
Yes you could build a PythonTool that returned an object:
{
“sampleName”: YourSampleName,
"R1": samplename_R1.fastq.gz,
"R2": samplename_R2.fastq.gz
}
Which could map to the outputs:
Ultimately, it would be useful in Janis to refer to the first index of an output (eg: w.bclStep.fastqs[0]
), but we’re a little bit off that in #8
Great, thanks! So I suppose something like this should work?
class GenerateSampleMap(janis.PythonTool):
def id(self):
return "GenerateSampleMap"
def version(self):
return "v0.0.1"
@staticmethod
def code_block(files_list: List[str]):
samplemap = {}
for filename in files_list:
samplename = filename.split("_S")[0]
if not samplename in samplemap:
samplemap[samplename] = {}
if "R1" in filename:
samplemap[samplename]["R1"] = filename
elif "R2" in filename:
samplemap[samplename]["R2"] = filename
return [{"samplename": k, **v} for k, v in samplemap.items()]
def outputs(self) -> List[List[TOutput]]:
return [
TOutput("samplename", String()),
TOutput("R1", FastqGz()),
TOutput("R2", FastqGz()),
]
Ah I see I see. We don’t support these custom structures. I’d recommend making each return type an array:
def outputs(self) -> List[List[TOutput]]:
return [
TOutput("samplename", Array(String())),
TOutput("R1", Array(FastqGz())),
TOutput("R2", Array(FastqGz())),
]
(And changing your python code to suit)
Then when you use the result from this, you can dot scatter on all three fields: https://github.com/PMCC-BioinformaticsCore/janis-workshops/blob/master/workshop2/6-scatter.md
Awesome, thanks for the help Code is now
class GenerateSampleMap(janis.PythonTool):
def id(self):
return "GenerateSampleMap"
def version(self):
return "v0.0.1"
@staticmethod
def code_block(files_list: List[str]):
samplemap = {}
for filename in files_list:
samplename = filename.split("_S")[0]
if not samplename in samplemap:
samplemap[samplename] = {}
if "R1" in filename:
samplemap[samplename]["R1"] = filename
elif "R2" in filename:
samplemap[samplename]["R2"] = filename
return [[v[key] for key in sorted(v.keys())] for k, v in samplemap.items()]
def outputs(self) -> List[List[TOutput]]:
return [
TOutput("R1", FastqGz()),
TOutput("R2", FastqGz()),
]
which outputs roughly as
[['D1710903_S64_R1_001.fastq.gz', 'D1710903_S64_R2_001.fastq.gz'], ['D1820847_S46_R1_001.fastq.gz', 'D1820847_S46_R2_001.fastq.gz'], ['D1900814_S78_R1_001.fastq.gz', 'D1900814_S78_R2_001.fastq.gz'], ['D1904578_S33_R1_001.fastq.gz', 'D1904578_S33_R2_001.fastq.gz'], ['D1905752_S79_R1_001.fastq.gz', 'D1905752_S79_R2_001.fastq.gz'], ['D1908147_S47_R1_001.fastq.gz', 'D1908147_S47_R2_001.fastq.gz'], ['D1821957_S71_R1_001.fastq.gz', 'D1821957_S71_R2_001.fastq.gz'], ['D1905632_S84_R1_001.fastq.gz', 'D1905632_S84_R2_001.fastq.gz'], ['D1908155_S48_R1_001.fastq.gz', 'D1908155_S48_R2_001.fastq.gz'], ['D1812139_S1_R1_001.fastq.gz', 'D1812139_S1_R2_001.fastq.gz'], ['D1901986_S98_R1_001.fastq.gz', 'D1901986_S98_R2_001.fastq.gz'], ['D1907884_S45_R1_001.fastq.gz', 'D1907884_S45_R2_001.fastq.gz'], ['D1822234_S77_R1_001.fastq.gz', 'D1822234_S77_R2_001.fastq.gz'], ['D1905676_S2_R1_001.fastq.gz', 'D1905676_S2_R2_001.fastq.gz'], ['D1908600_S3_R1_001.fastq.gz', 'D1908600_S3_R2_001.fastq.gz']]
and is ideal for a dotproduct as you said!
Thanks! M
Hi,
Would it be possible to add a quick comment on how to use subworkflows? Do I just add them as in a "master" workflow?
Thanks M