cutright / DVH-Analytics

A DICOM Database Application for Radiation Oncology
Other
82 stars 30 forks source link

Request: Machine Learning without going through Regression module #107

Closed cutright closed 3 years ago

cutright commented 3 years ago

I've begun developing a new workflow to Machine Learning, so you don't have to generate a multi-variable regression first. I plan to also add sklearn classifiers (so you can use your outcome data!).

This isn't functional yet, just some basic GUI elements created so far.

cutright commented 3 years ago
Screen Shot 2020-11-10 at 1 58 52 PM

Created a new window for ML setup, instead of a series of dialogs. Appears to work, with the exception of Classifiers. Much easier to set up, and less prone to data type issues that the Linear Regressions has (e.g., I can just select all variables without error... at least for my test data, haven't tested None or empty data yet).

Should be easy to add some more ML algorithms from sklearn now.

cutright commented 3 years ago
image

Turns out classification does work as is, but you have to select a dependent variable sklearn deems as not "continuous". The residual plots are less meaningful for classification, as is MSE... may remove those. Random forest has different criterion options in classification vs regression, the other 3 algorithms appear to be the same.

Still thinking about the best way to get in categorical data. Currently, I edit the data from Data -> Show Stats Data in the menu bar, then right-click a column to add a new column, then paste my data from MS Excel making sure to line up by MRN and UID (if needed). You can copy from DVHA to Excel too. Annoyingly, you can't copy from macOS Numbers to DVHA (reverse works).

cutright commented 3 years ago
image

Multilayer Perceptron added

cutright commented 3 years ago
image

Although a little annoying, in order to get categorical data into these new features, you have to go to the menu bar -> Data -> Show Stats Data, right click a column and click Add Column, then enter your data there. Note that you can Ctrl+A to select all, Ctrl+C to copy, and then paste in Excel. Do your magic, then copy back into DVHA. Categorical data must be represented with integers. Unfortunately for now, you'll have to keep your categorical map on your own.

I realize there are many, many analytical graphs you may want... in the mean time, I recommend exporting the data after the modeling is complete and generate your own charts/graphs.

I removed the residual plots for classification algorithms, and replaced the MSE with accuracy, which is just correct predictions over observations.

Also, remember that you can save your model. The resulting file is just a python pickle file, it contains a dictionary:

{
   'y_variable': self.plot.y_variable,
   'regression': self.reg,
   'model': self.model,
   'tool_tips': self.tool_tips,
   'x_variables': self.plot.x_variables,
   'title': self.title,
   'input_parameters': self.input_parameters,
   'data_split': self.data_split_parameters,
   'version': DefaultOptions().VERSION
}

Where regression is a misnomer... it's the scikit-learn class object which could be a regressor or a classifier. I'll probably change this terminology by v1.0... maybe predictor makes more sense.

cutright commented 3 years ago

Starting in v0.8.9, the machine learning model will have a .ml extension instead of .mlr. It will still be a pickle file, but with this format:

{
   'y_variable': self.plot.y_variable,
   'model': self.model,
   'sklearn_predictor': self.sklearn_predictor,
   'tool_tips': self.tool_tips,
   'x_variables': self.plot.x_variables,
   'title': self.title,
   'input_parameters': self.input_parameters,
   'data_split': self.data_split_parameters,
   'version': DefaultOptions().VERSION
}

Machine Learning views now have a "Features" button that will launch a simple window listing all of the features in the current model.

Need to resolve a bug when loading a model from the menu bar: Data -> Load Model -> Machine Learning, then this feature request should be complete

Note that what used to be called regression is now model (since it could also apply to classifiers), and model is now sklearn_predictor which is the sklearn module / class object.

cutright commented 3 years ago

Should be good to go now.

Note that saved machine learning models now have a .ml extension. The old .mlr files are still supported, but you'll have to manually change their extension to .ml.