We want a script that does all model validation steps for us given one or more tfrecord datasets. Steps include the calculation of
the predicted cell_table
f1, precision, recall and specificity scores for different thresholds .01:.99
scores split by marker as a table
Plotting of
scores vs. threshold as 2x2 facetplot
scores split by marker as heatmap
N worst predictions with input, gt, pred next to each other
Relevant background
This feature would make it easy to automatically calculate performance metrics for different experiments and datasets. Part of the required functions are already available in metrics.py.
Design overview
The following steps need to be performed in the script:
Load model and validation data
Load external validation data (i.e. proof read samples)
For each dataset:
Predict samples, post-process to cell_tableand save as csv
Calculate scores (f1, precision, recall, specificity) based on cell_table and do facetplot
Calculate scores for each marker individually and plot heatmap
Plot N worst tiles with input, gt, pred next to each other
Store all results in a sub-folder of the experiment folder named the same as the tfrecord dataset file without suffix
Code mockup
Model and dataset loading and metrics calculation is already implemented in metrics.py. Plotting functionality is partly implemented in plot_utils.py. Missing pieces need to be identified and added.
Required inputs
Parameters params.toml, model weights best_model.pkl and tfrecord files
Output files
Folder with cell_table.csv, plots and metrics stored as CSVs.
Timeline
Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
[x] A couple days
[ ] A week
[ ] Multiple weeks. For large projects, make sure to agree on a plan that isn't just a single monster PR at the end.
Estimated date when a fully implemented version will be ready for review:
Estimated date when the finalized project will be merged in: 01/23
Instructions
We want a script that does all model validation steps for us given one or more tfrecord datasets. Steps include the calculation of
Plotting of
Relevant background This feature would make it easy to automatically calculate performance metrics for different experiments and datasets. Part of the required functions are already available in
metrics.py
.Design overview The following steps need to be performed in the script:
For each dataset:
cell_table
and save as csvcell_table
and do facetplotCode mockup Model and dataset loading and metrics calculation is already implemented in
metrics.py
. Plotting functionality is partly implemented inplot_utils.py
. Missing pieces need to be identified and added.Required inputs
Parameters
params.toml
, model weightsbest_model.pkl
and tfrecord filesOutput files
Folder with
cell_table.csv
, plots and metrics stored as CSVs.Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
Estimated date when a fully implemented version will be ready for review:
Estimated date when the finalized project will be merged in: 01/23