This PR begins our move toward dropping longshot and adds in some optional functionality for primer schemes.
The original minion behaviour is still kept by default, and all tests pass.
Longshot:
adds a --no-longshot flag to minion, which prevents longshot running in the medaka workflow and instead uses medaka annotate to add info to the VCF
adds a prefilter before filtering the medaka VCFs to bin vars with <2 read in support and var quality < 20, instead of writing them to the FAIL VCF and contributing to the consensus mask
because of the prefilter, the longshot and medaka filters in vcf_filter have been combined to a single filter as they perform the same function
Primer schemes:
adds --strict flag to minion to enable both primer scheme checking (for ARTIC format conventions) and filtering of the merged VCF to remove variants that are present only once in amplicon overlap regions, or are present in primer binding sites.
adds --scheme-version flag to minion to give user option of specifying primer scheme version. Scheme version can be omitted and the original behaviour (schemename/V1) is still used preferentially over the new flag. Default will remain 1 but the latest available scheme can be specified using 0
if the scheme name is not found using the original logic (combining scheme name, scheme dir and scheme version), minion will attempt to download the scheme from the ARTIC repos (only nipah, ebola, scov2 available atm)
the minion command can now be simplified a bit by dropping --scheme-directory and autodownload schemes:
re-arranged the variant calling code block in minion, removing the need to split the BAMs by readgroup as medaka performs BAM filtering in place - resulting in fewer bam copies
added some code comments throughout minion to document pipeline steps
moved some file opening closer to where they are being used
Notes
the new --no-longshot flag causes more Ns in the consensus as we do not achieve the same filtering as longshot was doing, ending up with a few more vars appearing in our VCF fail file, which are applied to the mask
a bug in medaka means that when medaka annotate is used instead of longshot (via the new flag), INDELs will be silently dropped as medaka incorrectly reports no read support
the only use of the primertrimmed bams are for longshot and for artic_make_depth_mask, do we want to keep these where they are in future or remove them
the new functionality is currently provided via artic-tools and installed into the env via conda
This PR begins our move toward dropping longshot and adds in some optional functionality for primer schemes.
The original
minion
behaviour is still kept by default, and all tests pass.Longshot:
--no-longshot
flag tominion
, which prevents longshot running in the medaka workflow and instead usesmedaka annotate
to add info to the VCFvcf_filter
have been combined to a single filter as they perform the same functionPrimer schemes:
--strict
flag tominion
to enable both primer scheme checking (for ARTIC format conventions) and filtering of the merged VCF to remove variants that are present only once in amplicon overlap regions, or are present in primer binding sites.--scheme-version
flag to minion to give user option of specifying primer scheme version. Scheme version can be omitted and the original behaviour (schemename/V1) is still used preferentially over the new flag. Default will remain 1 but the latest available scheme can be specified using 0minion
will attempt to download the scheme from the ARTIC repos (onlynipah
,ebola
,scov2
available atm)minion
command can now be simplified a bit by dropping--scheme-directory
and autodownload schemes:General:
minion
, removing the need to split the BAMs by readgroup as medaka performs BAM filtering in place - resulting in fewer bam copiesminion
to document pipeline stepsNotes
--no-longshot
flag causes more Ns in the consensus as we do not achieve the same filtering as longshot was doing, ending up with a few more vars appearing in our VCF fail file, which are applied to the maskprimertrimmed
bams are for longshot and forartic_make_depth_mask
, do we want to keep these where they are in future or remove them