Closed vsmalladi closed 2 years ago
kmer consistency check was never implemented in the WDL workflow, so none of the modimer outputs are used.
For reference: https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake/blob/main/rules/sample_kmer_consistency.smk
This is a pairwise comparison of the modimers.tsv files from each movie, after subtracting the reference modimers.tsv. Essentially:
movie1 modimers - reference modimers -> non-reference movie1 modimers movie2 modimers - reference modimers -> non-reference movie2 modimers
count(subtract(non-reference movie1 modimers, non-reference movie2 modimers)) + count(subtract(non-reference movie2 modimers, non-reference movie1 modimers)) -> count of unique modimers from pairwise comparison
We report a metric representing the proportion of total non-reference modimers that are unique to one movie. If this count is above some threshold, it's likely that the two movies (SMRT Cells) were loaded with different samples.
kmer consistency check was never implemented in the WDL workflow, so none of the modimer outputs are used.
For reference: https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake/blob/main/rules/sample_kmer_consistency.smk
This is a pairwise comparison of the modimers.tsv files from each movie, after subtracting the reference modimers.tsv. Essentially:
movie1 modimers - reference modimers -> non-reference movie1 modimers movie2 modimers - reference modimers -> non-reference movie2 modimers
count(subtract(non-reference movie1 modimers, non-reference movie2 modimers)) + count(subtract(non-reference movie2 modimers, non-reference movie1 modimers)) -> count of unique modimers from pairwise comparison
We report a metric representing the proportion of total non-reference modimers that are unique to one movie. If this count is above some threshold, it's likely that the two movies (SMRT Cells) were loaded with different samples.