PPPLDeepLearning / plasma-python

PPPL deep learning disruption prediction package
http://tigress-web.princeton.edu/~alexeys/docs-web/html/
79 stars 43 forks source link

Add option to directly sample the disruptive subset during shot list/set splitting #44

Open felker opened 4 years ago

felker commented 4 years ago

Currently, if the testing and training ({train} U {validate}) are drawn from the same source shot list, then the ratio conf['model']['train_frac'] is used to randomly divide the source shots without regards to the shot classes. This also occurs for the splitting of the train and validate sets with conf['model']['validation_frac'].

So, while the the division of the overall shot counts will exactly match the desired fractions within 1/N (where N is the total number of shots), the division of the non-/ disruptive shots among the sets may not be so close to that fraction. This is only a problem when the number of disruptive (or nondisruptive) samples is low and/or the training and testing sets are drawn from different raw lists. As the number of samples -> infinity, of course the N{validate, disrupt}/N{training, disrupt} -> conf['model']['validation_frac'], e.g.

There is no real reason not to explicitly divide the disruptive and non-disruptive classes when performing the splitting of the shot sets, so I think we should at least add it as an option, if not make it the default behavior