Open vladpetyuk opened 1 year ago
No separation. (If anything, opposite) See attached CSV. HIPK2 mini analysis.csv Suggestions for me to try next:
Difficult kinase (will try other kinases) a. Tried other kinase (ABL1) from the TK group (this group had the best performance, previously) b. [NEW] See the ROC curve comparison (with accuracy and cutoff overlays, corresponding to each point on the ROC curve) ABL1_HIPK2.pdf c. [NEW] ABL1 performed better, but not as well as TK overall (plus, the difference may not be statistically significant). Similarly, HIPK2 performed much worse than CMGC overall.
Difficult sites (will try other random sampling)
Family had high AUC but not the highest. (Will try with other kinases from other families) a. [NEW] See bullet 1
Some bug making the trained weights nonsensical (?)
0 means "target" and 1 means "decoy" (unlikely issue)
Next steps (4/3/2023)
22
23.5@-04`00.pdf3a. [NEW] Ensure the scores are actually the same between the handful and the pseudo-test set 3b. If step 2 produces drastically better results than the existing ROC curves for ABL1 and HIPK2, run the original issue's procedure with all targets and decoys associated with ABL1 and HIPK2.
DeepKS
atlas