Closed MLRadfys closed 4 years ago
Hey Michael,
the augmentation is based on probability.
This means the data set act as a variant database. In each iteration, the image get pulled and then augmentated. Each active data augementation method is then applied or not applied with a specific probability, which is defined in the class variable config_p_per_sample.
By default, the probability is 15%.
The degree or options of these augmentation methods are also randomly picked but in a fixed range. E.g. config_scaling_range = (0.85, 1.25)
Therefore, if I activate the scaling, rotation and mirroring method, we will get the following data augmentation:
Original Image: -> Probability to perform Mirroring on the image is 15% -> Probability to perform Scaling on the image is 15% -> Probability to perform Rotation on the image is 15% -> Probability to perform any other data augmentation method is 0%
These probabilities are separate events, therefore multiple methods can and will be applied on the same image.
Sometimes it makes sense that you want a very high probability for e.g. mirroring and a very low for scaling -> A probability variable for each method.
Currently, MIScnn only supports a single probability option which is the same for all methods But I have this feature on my personal Trello agenda.
Hope that I was able to answer your questions regarding the data augmentation.
Cheers, Dominik
Hi Dominik,
awesome!, that was exactly what I was looking for. Thanks for the clarification!
Cheers,
Michael
Hi again,
I got some question about the data augmentation module. Are all images augmented or is the augmentation based on probability (for example, 50% of the original data are augmented) ?
Thanks in advance,
kind regards,
Michael