PMCC-BioinformaticsCore / janis-core

Core python modules for Janis Pipeline workflow assistant
GNU General Public License v3.0
4 stars 9 forks source link

Conditionals' switches do not always work if one of the step input files requires assert_not_null() #36

Open rlupat opened 4 years ago

rlupat commented 4 years ago

Example:

self.conditional("combine_gatk_variants", [
    (IsDefined(self.mutect2.out), 
        CombineVariants_0_0_8(
            vcfs=[self.splitnormalisevcf.out, self.mutect2.out.assert_not_null()],
            type="germline",
            columns=["AD", "DP", "AF", "GT"],
        ),
    ),
    CombineVariants_0_0_8(
        vcfs=[self.splitnormalisevcf.out],
        type="germline",
        columns=["AD", "DP", "AF", "GT"],
    ),
])

In WDL, this will be translated to:

  call C.combine_gatk_variants as combine_gatk_variants {
    input:
      cond_mutect2_out=mutect2.out,
      switch_case_1_vcfs=[splitnormalisevcf.out, select_first([mutect2.out])],
      switch_case_2_vcfs=[splitnormalisevcf.out]
  }

where select_first([mutect2.out] will fail when mutect2_out is Null


If this is implemented without assert_not_null() and mutect2.out from the example above is Optional (due to other conditionals from previous step); subsequent tools that have required inputs will not work.

rlupat commented 4 years ago

Other possible implementation for the example above is by using FilterNotNull Operator

        self.step(
            "combine_gatk_variants",
            CombineVariants_0_0_8(
                vcfs=FilterNullOperator([self.splitnormalisevcf.out, self.mutect2.out]),
                type="germline",
                columns=["AD", "DP", "AF", "GT"],
            ),
        )

which will be translated to WDL select_all.

However, when the switches involve multiple tools, the workaround above will not work anymore.
For example:

self.conditional("conditionalA", [
    (IsDefined(self.toolB2.out), 
       ToolC(
            vcfs=[self.toolB1.out, self.toolB2.out.assert_not_null()],
        ),
    ),
    ToolD(
        vcfs=[self.toolB1.out],
     ),
])
illusional commented 4 years ago

You should be able to use the FilterNullOperator to get the non-null values of the array, because you know the second pointer in the array has a value, eg:

CombineVariants_0_0_8(
    vcfs=FilterNullOperator([self.splitnormalisevcf.out, self.toolB2.out]),
    type="germline",
    columns=["AD", "DP", "AF", "GT"],
)
rlupat commented 4 years ago

Good point. How about when ToolC is not accepting arrays?

E.g.

self.conditional("conditionalA", [
    (IsDefined(self.toolB2.out), 
       ToolC(
            vcfs=self.toolB2.out,
        ),
    ),
    ToolD(
        vcfs=self.toolB1.out,
     ),
])

I don't think FilterNullOperator works for the case above?