Closed gngdb closed 9 years ago
Sounds like a good idea, so long as there is a separation of the locale specific settings (data path) from the run specific parameters.
This is almost done, just need test.py to also parse in the same way. Which actually might be a problem, but I'll see what happens.
I think we need a separation of "distorted-replication-like" Augmentations from "image-alignment-manipulation" Augmentations.
I would call the former "Augmentation", and it would include rotations, flipping, etc. These are optional and inflate the dataset. The test.py does not need to apply these, but might find it useful to apply non-destructive manipulations (flip and rotate 180deg) to the data and take the average of the results.
The later I would call "Preprocessing", and the options would be resize or "Non-scaling augmentation". These are mandatory. The test.py must apply these functions before sending the data to be processed by the feature generation function.
And ideally the Preprocessing functions get combined together with a preprocessing wrapper function and saved as a method of the pickled model object. But this may not be practical for Pylearn2 based stuff. It is the solution I intend to use for sklearn, since it requires the preprocessing be precisely the same for test as it was for train.
Here, I am assuming Preprocessing is deterministic.
I will ensure that shape-fixing is deterministic, and to have non-centred shape-fixed images they must be augmented with padding.
Getting lots of predictions for differently padded versions of the image and then taking the average is left as an exercise for the tester, and applying this is a separate issue. However, which augmentations to apply to the test dataset should be a parameter in the settings file.
So there should be three parameters in run_settings.json:
The TestAugmentation may or may not have the same crop/pad parameters as TrainAugmentation. The TrainAugmentation rotate parameters should most likely span a greater range than the TestAugmentation.
Sounds good, remember because their dictionary keys you can just put spaces in there. "Train Augmentation" and "Test Augmentation". Although, I've been mixed with applying this.
Pending an issue for test.py
to be able to deal with merging multiple submissions from different Pylearn2 iterations.
Are we going to parse different settings files to define each run, so that it can be repeated whenever? At the moment, just loads a single settings file. Some settings (data locations, classes etc) will need to be global though.
Should maybe parse run json files as well, and pass them on the command line to train. Those can contain details about what model to train etc (essentially point at some code in tools with some options).
Anyone else think this is the right way to do it?