Validation of the validation DeepKS vs Atlas

vladpetyuk commented 1 year ago

Pick the best behaving kinase out of large family. Say, HIPK2.
Pick its corresponding sites from phosphositeplus
- how many sites are there?
Add equal number of decoys (by assigning sites from a kinase from the same group, but beyond 95% threshold similarity)
Compute scores for true/decoy motifs using DeepKS
Check the separation. If not present, STOP
Get the scores or percentiles using atlas
Check the correlation

Ben-Drucker commented 1 year ago

No separation. (If anything, opposite) See attached CSV. HIPK2 mini analysis.csv Suggestions for me to try next:

Difficult kinase (will try other kinases) a. Tried other kinase (ABL1) from the TK group (this group had the best performance, previously) b. [NEW] See the ROC curve comparison (with accuracy and cutoff overlays, corresponding to each point on the ROC curve) ABL1_HIPK2.pdf c. [NEW] ABL1 performed better, but not as well as TK overall (plus, the difference may not be statistically significant). Similarly, HIPK2 performed much worse than CMGC overall.
Difficult sites (will try other random sampling)
Family had high AUC but not the highest. (Will try with other kinases from other families) a. [NEW] See bullet 1
Some bug making the trained weights nonsensical (?)
0 means "target" and 1 means "decoy" (unlikely issue)

Ben-Drucker commented 1 year ago

Next steps (4/3/2023)

[X] Make sure I can reproduce this set of ROC curves (with at most slight differences) with the current model weights. The weights are different from before because all of the data (~22k targets from PSP + ~22k decoys from PSP) is being used for training instead of (~13k targets and ~13k decoys). Before, we trained on fewer inputs because we needed to save some for the validation and test sets.
- Reproduction successful: ROC_10_2023-04-03@152223.5@-04`00.pdf
[X] If step 1 fails, STOP and investigate [Did not fail]. Part of that can be making an ROC plot broken down by kinase in the, say, TK group). This will allow us to identify the under-performing kinases. Even if step 1 succeeds, this will also help reveal discrepancies between step 1's ROC curves and the previous comment's ROC curves.
- Step 1 did not fail
- [NEW] TK ROC curves
- [NEW] CMGC ROC curves
- [NEW] You may have to download and zoom in to see the (intentionally) tiny labels for the non-emphasized kinases.
- [NEW] These plots show that for both groups, the randomly chosen kinase (HIPK2/ABL1) didn't perform quite as well as the majority of the other kinases, giving credence to the theory that we "got unlucky" which caused the under-performance on the randomly chosen "handful". But there is still a performance gap of about 20% ROC. This may have to do with the sites. The next step involves expanding the "handful."
  3a. [NEW] Ensure the scores are actually the same between the handful and the pseudo-test set 3b. If step 2 produces drastically better results than the existing ROC curves for ABL1 and HIPK2, run the original issue's procedure with all targets and decoys associated with ABL1 and HIPK2.
  1. If we still find that there is a discrepancy between step 2's ROC curves and step 3's ROC curves, investigate differences in the actual code being run. Because there is no reason 2 and 3 should produce different results. This may reveal a fundamental problem. [NEW] There may also be a difference in how the group classifier is being used.

PNNL-Comp-Mass-Spec / DeepKS

Validation of the validation DeepKS vs Atlas #28