Sum02dean / STRINGSCORE

1 stars 1 forks source link

Plot ROC curves for train/test/valid score #5

Closed Sum02dean closed 2 years ago

Sum02dean commented 2 years ago

Expect to see increasing off-diagonal profile for the score correctness (benchmark plot).

Sum02dean commented 2 years ago

Will need to get the intersection between the benchmark data and the train, and the test data to plot the benchmark statistics.

https://github.com/Sum02dean/STRINGSCORE/blob/d7d6794f4002716e0747f504113648c07590e076/src/xgboost_model.py#L285-L287
Sum02dean commented 2 years ago

Plotting for human (train/test/valid) balanced, no noise, no COG split:

image

Sum02dean commented 2 years ago

Plotting on ecoli balanced, no noise, no COG split:

image

Sum02dean commented 2 years ago

Plotting on yeast - balanced, no noise, no COG split:

image

damianszk commented 2 years ago

Why we are looking at the balanced, no noise, no COG split? It's the least interesting, as we know we have issues there. Unbalanced, noise, COG split is the cleanest one to see the performance.

But anyway... the question here is why these curves are so different. So we have training (logistic?), testing (diagonal), and validation (totally off). What's wrong with the validation...

Sum02dean commented 2 years ago

In order to identify issues in a systematic manner. I am performing the same analysis with: