ethanumn / mpn-aml-pairtree

0 stars 0 forks source link

Detect issues in data #5

Open ethanumn opened 3 years ago

ethanumn commented 3 years ago

TODO

ethanumn commented 3 years ago

Garbage detection notes:

ethanumn commented 3 years ago

LOH detection notes:

ethanumn commented 3 years ago

Potential things to discuss in README.md writeup for (fix_bad_var_prob, removegarbage) scripts:

ethanumn commented 3 years ago

Added capability to modify ssm files based on VAF.

See commit b24d7710f7f576a449bb30b458e1e379d4118743.

ethanumn commented 3 years ago

Ran LOH and Garbage detection using four different SSM files:

--- Testing for LOH on test.ssm ---

num_bad_ssms=29 bad_ssms=['s1', 's4', 's12', 's21', 's30', 's32', 's39', 's42', 's43', 's44', 's52', 's56', 's58', 's59', 's67', 's79', 's87', 's89', 's91', 's93', 's100', 's101', 's115', 's117', 's123', 's129', 's132', 's140', 's146'] bad_samp_prop=0.020 bad_ssm_prop=0.196

--- Testing for LOH on test.rm.vaf.ssm ---

num_bad_ssms=0 bad_ssms=[] bad_samp_prop=0.000 bad_ssm_prop=0.000

--- Testing for LOH on test.scaled.vaf.ssm ---

num_bad_ssms=27 bad_ssms=['s4', 's12', 's21', 's30', 's32', 's39', 's42', 's43', 's44', 's52', 's56', 's58', 's59', 's67', 's79', 's87', 's91', 's93', 's100', 's101', 's115', 's117', 's123', 's129', 's132', 's140', 's146'] bad_samp_prop=0.018 bad_ssm_prop=0.182

--- Testing for LOH on test.org.vaf.ssm ---

num_bad_ssms=0 bad_ssms=[] bad_samp_prop=0.000 bad_ssm_prop=0.000

For garbage detection w/ --max-garb-prob 0.25, and similar results for --max-garb-prob 0.5:

test.scaled.vaf.params.json

total of 99 garbage

garbage: ["s1", "s2", "s4", "s5", "s7", "s8", "s12", "s14", "s17", "s18", "s19", "s20", "s21", "s22", "s23", "s24", "s25", "s26", "s27", "s30", "s31", "s32", "s33", "s36", "s38", "s39", "s41", "s43", "s44", "s45", "s46", "s48", "s49", "s50", "s51", "s52", "s53", "s55", "s56", "s58", "s59", "s60", "s61", "s62", "s63", "s64", "s67", "s69", "s70", "s71", "s73", "s75", "s76", "s77", "s78", "s79", "s80", "s82", "s83", "s86", "s87", "s89", "s91", "s93", "s97", "s100", "s101", "s105", "s106", "s108", "s109", "s110", "s111", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121", "s123", "s124", "s125", "s126", "s127", "s128", "s129", "s131", "s132", "s133", "s136", "s139", "s140", "s141", "s143", "s144", "s146", "s147"]

test.rmvaf.params.json

total of 70 garbage

garbage: ["s1", "s2", "s4", "s5", "s10", "s11", "s13", "s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s25", "s28", "s30", "s31", "s32", "s33", "s34", "s36", "s37", "s38", "s39", "s40", "s42", "s44", "s45", "s46", "s47", "s48", "s51", "s52", "s54", "s56", "s57", "s58", "s60", "s61", "s64", "s66", "s68", "s69", "s75", "s76", "s78", "s79", "s80", "s83", "s84", "s85", "s86", "s87", "s88", "s89", "s90", "s91", "s92", "s93", "s95", "s96", "s98", "s99", "s102", "s103", "s105", "s106", "s108"]

test.params.json

total of 107 garbage

garbage: ["s1", "s2", "s3", "s4", "s5", "s7", "s8", "s12", "s14", "s15", "s17", "s18", "s19", "s20", "s21", "s22", "s23", "s24", "s25", "s26", "s27", "s30", "s32", "s33", "s36", "s38", "s39", "s40", "s41", "s43", "s44", "s45", "s46", "s48", "s49", "s50", "s51", "s52", "s53", "s55", "s56", "s58", "s59", "s60", "s61", "s62", "s63", "s64", "s65", "s67", "s69", "s70", "s71", "s73", "s75", "s76", "s77", "s78", "s79", "s80", "s82", "s83", "s86", "s87", "s88", "s89", "s91", "s93", "s94", "s96", "s97", "s100", "s101", "s105", "s106", "s108", "s109", "s110", "s111", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121", "s122", "s123", "s124", "s125", "s126", "s127", "s128", "s129", "s131", "s132", "s133", "s135", "s136", "s139", "s140", "s141", "s143", "s144", "s146", "s147"]

test.org.vaf.params.json

total of 97 garbage

garbage: ["s1", "s2", "s4", "s5", "s10", "s11", "s13", "s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s25", "s28", "s29", "s30", "s31", "s32", "s33", "s34", "s36", "s37", "s38", "s39", "s40", "s42", "s44", "s45", "s46", "s47", "s48", "s51", "s52", "s54", "s56", "s57", "s58", "s60", "s61", "s64", "s66", "s68", "s69", "s75", "s76", "s78", "s79", "s80", "s83", "s84", "s85", "s86", "s87", "s88", "s89", "s90", "s91", "s92", "s93", "s95", "s96", "s98", "s99", "s102", "s103", "s105", "s106", "s107", "s108", "s110", "s111", "s113", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121", "s122", "s128", "s130", "s131", "s132", "s134", "s136", "s138", "s140", "s143", "s144", "s145", "s146", "s147"]

ethanumn commented 3 years ago

Changing the .ssm to have a maximum total_read count to 400:

-- Testing for LOH on mats08.scaled.400.ssm --

num_bad_ssms=23 bad_ssms=['s12', 's21', 's30', 's32', 's39', 's42', 's43', 's44', 's56', 's58', 's59', 's67', 's87', 's91', 's93', 's101', 's115', 's117', 's123', 's129', 's132', 's140', 's146'] bad_samp_prop=0.016 bad_ssm_prop=0.155

-- Removing Garbage w/ LOH variants labeled as garbage --

using --max-garb-prob 0.5

Total number of garbage mutations: 47

Garbage: ['s1', 's2', 's4', 's5', 's14', 's17', 's18', 's19', 's22', 's24', 's26', 's38', 's45', 's49', 's50', 's51', 's52', 's53', 's55', 's62', 's63', 's70', 's71', 's76', 's77', 's78', 's79', 's80', 's82', 's86', 's89', 's97', 's100', 's106', 's110', 's114', 's116', 's118', 's119', 's121', 's126', 's127', 's131', 's136', 's139', 's143', 's144']


using --max-garb-prob 0.1

Total number of garbage mutations: 47

Garbage: ['s1', 's2', 's4', 's5', 's14', 's17', 's18', 's19', 's22', 's24', 's26', 's38', 's45', 's49', 's50', 's51', 's52', 's53', 's55', 's62', 's63', 's70', 's71', 's76', 's77', 's78', 's79', 's80', 's82', 's86', 's89', 's97', 's100', 's106', 's110', 's114', 's116', 's118', 's119', 's121', 's126', 's127', 's131', 's136', 's139', 's143', 's144']


using --max-garb-prob 0.01

Total number of garbage mutations: 47

Garbage: ['s1', 's2', 's4', 's5', 's14', 's17', 's18', 's19', 's22', 's24', 's26', 's38', 's45', 's49', 's50', 's51', 's52', 's53', 's55', 's62', 's63', 's70', 's71', 's76', 's77', 's78', 's79', 's80', 's82', 's86', 's89', 's97', 's100', 's106', 's110', 's114', 's116', 's118', 's119', 's121', 's126', 's127', 's131', 's136', 's139', 's143', 's144']


using --max-garb-prob 0.001

Total number of garbage mutations: 47

Garbage: ['s1', 's2', 's4', 's5', 's14', 's17', 's18', 's19', 's22', 's24', 's26', 's38', 's45', 's49', 's50', 's51', 's52', 's53', 's55', 's62', 's63', 's70', 's71', 's76', 's77', 's78', 's79', 's80', 's82', 's86', 's89', 's97', 's100', 's106', 's110', 's114', 's116', 's118', 's119', 's121', 's126', 's127', 's131', 's136', 's139', 's143', 's144']

SWITCHED TO TESTING SCALE TO 250

-- Testing for LOH using mats08.scaled.250.ssm --

num_bad_ssms=21 bad_ssms=['s12', 's21', 's30', 's32', 's39', 's42', 's43', 's44', 's56', 's58', 's67', 's87', 's93', 's101', 's115', 's117', 's123', 's129', 's132', 's140', 's146'] bad_samp_prop=0.015 bad_ssm_prop=0.142


using --max-garb-prob 0.5

Total number of garbage mutations: 20

Garbage: ['s5', 's14', 's18', 's22', 's45', 's59', 's63', 's76', 's77', 's79', 's80', 's86', 's89', 's91', 's100', 's110', 's119', 's121', 's126', 's131']


using --max-garb-prob 0.1

Total number of garbage mutations: 20

Garbage: ['s5', 's14', 's18', 's22', 's45', 's59', 's63', 's76', 's77', 's79', 's80', 's86', 's89', 's91', 's100', 's110', 's119', 's121', 's126', 's131']


using --max-garb-prob 0.01

Total number of garbage mutations: 21

Garbage: ['s2', 's5', 's14', 's18', 's22', 's45', 's59', 's63', 's76', 's77', 's79', 's80', 's86', 's89', 's91', 's100', 's110', 's119', 's121', 's126', 's131']