Open dhimmel opened 8 years ago
Here are my thoughts:
sklearn.linear_model.SGDClassifier
with a grid search to find the optimal l1_ratio
and alpha
. See 2.TCGA-MLexample.ipynb
for an example.probability
, score
, class
under a predictions
key. The frontend should handle cases where probability
is absent. @gwaygenomics, @yl565, @stephenshank: do you agree?
Can you clarify what you mean by number 3?
Or do we want report performance that span thresholds?
Like AUROC?
By "span thresholds" I'm referring to any measure computed from predicted probabilities/scores, such as AUROC or AUPRC. By "single classification threshold", I'm referring to any measure computed from predicted classes, such as precision, recall, accuracy, or F1 score.
got it. Then yes, this all looks good to me
+1
Sounds good!
We're nearing the point where we'll need to implement a machine learning module to execute user queries. We're looking to create a minimum viable product. We can expand functionality later, but for now let's focus on the simplest and most succinct implementation. There are several decisions to make:
So let's work out these choices, with a focus on simplicity.