Closed LittlePea13 closed 4 years ago
Thanks for the terrific PR!
I've quickly glanced through the files and I've seen that you calculate the accuracy and confusion matrix to get the accuracy. The plan originally is to provide a comparison using Bland Altman analysis and showing a small bias and an agreement interval of 2%. This last point is particularly important because it's the threshold given to us by the Mds and the Bland Altman method is more suited to compare two unperfect measuring instruments while other methods usually assume that one of the methods is a perfect ground truth method. If you want a better understanding of Bland Altman, and why we chose it at first, please check out that paper, it has helped me tremendously :). You can still use and compare the confusion matrices, but for the MD, I believe we should return the agreement interval.
Anyway, the Bland Altman analysis is implemented in that folder. It takes as inputs the batches of results from the Oximeter and the estimation method. My plan in the next upcoming days is to create a class where you can input those results and that will perform the calculation and return the agreement interval. That way, anyone can create a object of the Class in their method and progressively add the labels and estimation in it. At the end, the evaluation is run and we get the agreement interval. What do you think?
That sounds fine to me! Just to clear any confusion, I am using Mean Average Error (MAE), as the task is regression and not classification. Sorry if the code is confusing, it is due to the fact I adapted it from a Classification code. MAE was used for reporting Spo2 error in the Nemcova paper.
I will check the references you link when I have the time, but for now, if the Bland Altman method can be computed just from the target labels and the predicted output, it is just necessary to change the input function called get_acc at the experiments point for one that returns the Bland Altman metric. Right now it is set to get_mae and therefore it reports MAE at each epoch for Train and Test.
experiments = {
'experiment_path':'roll',
'data_path':'roll/roll',
'model_root':'model',
'models':get_models(),
'norm':False,
'get_acc': get_mae,
'resume':False,
'num_epoch':1000
}
Another thing to be explored is whether the loss should be changed to another one more fitting to our problem.
I will add the compute the Bland Altman metric to the to do list. Maybe we can report both MAE and Bland in order to compare results with Nemcova to.
Also added the Todo list to the Readme
BTW, the error may be also a bad indicator right now, since the data from Nemcova is all within the same range of O2 levels, with little variation. I assume that when we get a bigger range, from sick patients with low levels for instance the performance may decrease.
Sounds good, I'm all in the data collection app and administrative tasks at the moment but when it's done I'll have a look into creating the class for the Bland-Allman method and add it to your codel. If you want to play with it feel free, the actual Bland-Allman method is already there so you could already use it :)
I think MAE and Bland Allman is indeed the way to go! although we could also run the Bland Allman analysis on Nemcova's method (once the bugs are solved)
Do you think it would be more tidy to have each NN model in different files?
Good catch and yes I do think so :)
I'll do that and then merge.
Currently on Semcova dataset (test files are the ones that start with data_): Lowest MAE for Spo2 on test set, paper reported 1.1%:
Please be aware these are the best epoch results, before it starts over-fitting. Since we do not have a validation set (data to scarce), this is an overly optimistic results. There is also not enough training data to call this anywhere conclusive, but we have the models available to train once we get more data.
The code is recycled from other GitHub (referenced in file), therefore it probably needs a clean-up. Not sure if should not yet merge to master and wait for other contributions, or add it already so everyone has a more direct access to the file.
Suggested TODO:
Add Bland Altman analysis to the metrics (Currently only MAE, Mean Average Error)
Switch to Tensorboard reporting instead of printing.
Normalize input to the network.
Avoid saving the model each time there is an improvement, rewrite over previous one, or add option to save or not.
General clean up of the code, probably many ways to be improved.
Train on HR.
Use sample_data as validation set.
I didn't have time for all those tasks, and will be quite busy this week, but hope the code helps getting others on track.