Open biolancer opened 1 year ago
Sounds like a good plan for the beginning, I have some recommendations:
nextflow.config
. It should be possible to enable/disable each filter criteria through configuration parameters to allow adjustments if necessary. For example, some labs use lower AF threshold of >=3 % or >=1 %.1 & 2) Good point, the cutoffs should be set in the config file indeed, I could also set the upper boundaries as a changable variable to allow for a more specific tuning of the eligible mutations. The logic behind the 90 % AF cutoff is based on the assumption that in the case of a total or partial loss of WT alleles resulting in minor AF > 50 %, the final AF should not exceed 90 % as this would require a sample purity above 95 % as a given -- which I would strongly debate for all bulk sequencing methods.
3) Yes, sounds good. I liked the TMB presentation a lot and wanted to recreate something comparable.
4) I am not really sure, to be honest, and wouldn't know how to check if that would be the case. Coverage-, bed-based- and AF-base filtering routines are regularly implemented in TMB calculations, so is it even possible to claim it's the same algorithm or would it be something different if I leave out a filtering step? At no point will I be reusing proprietary code or anything prewritten, I would only follow the same filtering routines during datawrangling (we would be changing the procedure either way since we set variables instead of fixed values), but I am open to propositions for changes to the routine.
2) If the 90 % threshold also depends on the sample purity, maybe this should be documented. Also if this is a parameter one wants to adjust for each sample, it could depend on the sample purity as optional parameter in the samplesheet. However, I think a fixed value for each sample would be a more reasonable default, because the purity is always an estimate and low purities in samples (which frequently occurr) may filters too many variants. 4) I think you are right, the filtering criteria as AF, coverage, counts etc. should not be a problem. However, I am not sure about the procedure for checking ratios of filtered/unfiltered variants as this is not commonly used. This is a QC step and as suggested we add out own QC to the database.
Afterwards, it bins mutations with comparable allele frequency across each genome and generates a ratio of filtered to unfiltered mutations for each bin. If the ratio for a bin favors filtering and has at least 5 mutations marked for filtering, other eligible mutations in said bin will also be filtered.
Alright. I will set up the module to have both an upper and lower bound as "hard-filter" boundaries until we implement "tumor purity" as a potential metadata column to the samplesheet and will leave out the QC based filter for now, as it thus also remains compatible for later tumor-normal-pair input.
Since the TMB calculation requires a BED-file as input, a BED-file structural integrity check will be implemented. The integrity check will check for compatibility with bcftools filter and the TMB calculation routine.
We implemented an initial module for TMB calculation. We had several ideas for features to this module. I will collect them here and leave this issue open for further development. Features:
As a further enhancement and to increase readability, the reporting of false entries in the bedfile should follow the same nomenclature and reporting structure as the VCF check.
I would have a proposal for a TMB calculation module. It assumes tumor-only sequencing and requires only a VCF and BED file as input and works following this procedure:
The final TMB score would then be Eligible Variants / Effective panel size (in Mutations per MBp). The whole procedure follows the current implementation of the TSO500 RunManager app for TMB calculation and sounds reasonable to me.
Originally posted by @biolancer in https://github.com/cio-abcd/variantinterpretation/issues/5#issuecomment-1466003720