perhaps remove some of the poorer selections from functional score averaging

jbloom commented 1 year ago

You have already left out the really bad ones like LibB-220728-293T-1, which is the correct thing to do.

However, some of the least correlated selections remaining could also be worth leaving out. This would include LibB-230502-human-1.

In the averaging you can see it has a relatively lower correlation with others: cell [5] of https://dms-vep.github.io/LASV_Josiah_GP_DMS/notebooks/avg_func_effects_human_293T_entry.html

And it has more of two bumps for wildtype and stop in the functional scores distribution: https://dms-vep.github.io/LASV_Josiah_GP_DMS/notebooks/analyze_func_scores.html

It also has a worse training set accuracy than many others in global epistasis fitting: https://dms-vep.github.io/LASV_Josiah_GP_DMS/notebooks/func_effects_global_epistasis_LibB-230502-human-1.html

You have done a decent job of removing the worst ones, but you might do a pass over and look carefully and consider removing more marginal ones like one above and perhaps a few others.

Caleb-Carr commented 1 year ago

Only LibB-230502-human-1 was removed in the following commit 22e442442072475bd6d5c7d555deedc7c9a29e7d while the remaining selections seemed worth keeping. However, I am leaving this issue open for now because I want to revisit removing any other potential outliers in the future.

Caleb-Carr commented 1 year ago

Revisited this and kept only the 8 best looking selections in the following commit 6f8f4663046c7312a5b6813bb31a428accd16b5e

dms-vep / LASV_Josiah_GP_DMS

perhaps remove some of the poorer selections from functional score averaging #5