Closed jbloom closed 1 year ago
Only LibB-230502-human-1
was removed in the following commit 22e442442072475bd6d5c7d555deedc7c9a29e7d while the remaining selections seemed worth keeping. However, I am leaving this issue open for now because I want to revisit removing any other potential outliers in the future.
Revisited this and kept only the 8 best looking selections in the following commit 6f8f4663046c7312a5b6813bb31a428accd16b5e
You have already left out the really bad ones like
LibB-220728-293T-1
, which is the correct thing to do.However, some of the least correlated selections remaining could also be worth leaving out. This would include
LibB-230502-human-1
.In the averaging you can see it has a relatively lower correlation with others: cell [5] of https://dms-vep.github.io/LASV_Josiah_GP_DMS/notebooks/avg_func_effects_human_293T_entry.html
And it has more of two bumps for wildtype and stop in the functional scores distribution: https://dms-vep.github.io/LASV_Josiah_GP_DMS/notebooks/analyze_func_scores.html
It also has a worse training set accuracy than many others in global epistasis fitting: https://dms-vep.github.io/LASV_Josiah_GP_DMS/notebooks/func_effects_global_epistasis_LibB-230502-human-1.html
You have done a decent job of removing the worst ones, but you might do a pass over and look carefully and consider removing more marginal ones like one above and perhaps a few others.