Closed gngdb closed 9 years ago
Since we are loading all the training images into RAM at the start anyway, it will be much easier to simply load the augmented images at the start without applying the normalisation, then measure mean and std (storing the values for later use on test data), and normalise.
We then need two functions:
Yeah, it's going to be slightly annoying to make that part of the train pipeline at the moment though, because most processing is applied when the image is loaded; running over all images again to even apply normalization is costly. You'd want to check if the mean and stdev are known before loading and if they are apply the normalization at load time. If they're not then you have to load the images, check the mean and stdev, save them somewhere (in run settings json?) and apply them to all images.
The easy way to do that is just to have a script that will calculate them; and if the processing is called before this is calculated it just fails out or calls the script.
Anyway, amounts to pretty much the same as what you're saying, but with the difference that if the mean and stdev are known at load time just apply the normalization to each image as all the other processing is applied. Otherwise we have to change the way we're processing images, which would be annoying.
OK, Scott has convinced me that we should just do the normalisation after loading with numpy ufuncs.
ConvNets generally expect a z-scored style input. There are two ways we could do this, per-pixel and overall average. Whatever we do, though, we'll have to make sure we do the same thing to test set when it's loaded. Unfortunately, these statistics will also be different for every different kind of pre-processing, so the statistics will have to be calculated and stored in the run settings json at load time. One way to do this would be to have a script that will take a run settings file and load all the images as these settings would process them, calculate the statistics and write them to the specific run settings file. Then, it could use these statistics when initialising a scaler that could be part of the processing function at load time.
So for every pixel we would then subtract the mean and divide by the standard deviation either for all pixels in its position or for all pixels.
Then, we need a function in image processing that can apply the scaling to an image it's handed and we need to be able to build processing functions to do this through the augmentation function builder.