Updated site-level filter threshold to reflect what we learned from simulated data. This is optimized only for SNPs and we should emphasize this on CeNDR. 18df3a
Updated thresholds for het polarization and updated header description accordingly ea072b56605f225aa8. Also made het polarization code to output all sites instead of only snps 639425
I wasn't sure whether to keep the ad_dp.nim in the docker container so I removed it and then added it back 7312b2. Now it's removed eb1db93debdf838071171
ad_dp filter used to add FORMAT/FT tag for all sites. Now ad_dp is removed, GATK only creates FORMAT/FT for sites that have sample(s) that fail the filter. Aka for sites where all samples pass, they will not have FT field. So modified the --set-GT to./. step accordingly. I was wondering why hard filter removed 80% of all SNPs! lol a8b4d02
A related issue is in single strain tsv, the FORMAT/FT column has a mix of PASS and . for sites that pass the filter. So changed all . to PASS074072b
For single strain tsv: removed sites with genotype ./. 7fd2205 and reverted this change 89715f5 to keep all strain tsv having the same sites; and fixed the header which had \t not properly interpreted 3870faf
Most important updates:
Updated site-level filter threshold to reflect what we learned from simulated data. This is optimized only for SNPs and we should emphasize this on CeNDR.
18df3a
Updated thresholds for het polarization and updated header description accordingly
ea072b
56605f
225aa8
. Also made het polarization code to output all sites instead of only snps639425
Removed ad_dp step in main.nf since most of the AD/DP info is already captured in het polarization step
18df3a
. https://gatk.broadinstitute.org/hc/en-us/articles/360035532112-Coverage-Read-depth-metricsI wasn't sure whether to keep the ad_dp.nim in the docker container so I removed it and then added it back
7312b2
. Now it's removedeb1db93
debdf83
8071171
ad_dp filter used to add FORMAT/FT tag for all sites. Now ad_dp is removed, GATK only creates FORMAT/FT for sites that have sample(s) that fail the filter. Aka for sites where all samples pass, they will not have FT field. So modified the
--set-GT
to./.
step accordingly. I was wondering why hard filter removed 80% of all SNPs! lola8b4d02
A related issue is in single strain tsv, the FORMAT/FT column has a mix of
PASS
and.
for sites that pass the filter. So changed all.
toPASS
074072b
For single strain tsv: removed sites with genotype ./.
7fd2205
and reverted this change89715f5
to keep all strain tsv having the same sites; and fixed the header which had \t not properly interpreted3870faf