Questions regarding the data_augmentation module

MLRadfys commented 4 years ago

Hi again,

I got some question about the data augmentation module. Are all images augmented or is the augmentation based on probability (for example, 50% of the original data are augmented) ?

Thanks in advance,

kind regards,

Michael

muellerdo commented 4 years ago

Hey Michael,

the augmentation is based on probability.

This means the data set act as a variant database. In each iteration, the image get pulled and then augmentated. Each active data augementation method is then applied or not applied with a specific probability, which is defined in the class variable config_p_per_sample.

By default, the probability is 15%.

The degree or options of these augmentation methods are also randomly picked but in a fixed range. E.g. config_scaling_range = (0.85, 1.25)

Therefore, if I activate the scaling, rotation and mirroring method, we will get the following data augmentation:

Original Image: -> Probability to perform Mirroring on the image is 15% -> Probability to perform Scaling on the image is 15% -> Probability to perform Rotation on the image is 15% -> Probability to perform any other data augmentation method is 0%

These probabilities are separate events, therefore multiple methods can and will be applied on the same image.

Sometimes it makes sense that you want a very high probability for e.g. mirroring and a very low for scaling -> A probability variable for each method.

Currently, MIScnn only supports a single probability option which is the same for all methods But I have this feature on my personal Trello agenda.

Hope that I was able to answer your questions regarding the data augmentation.

Cheers, Dominik

MLRadfys commented 4 years ago

Hi Dominik,

awesome!, that was exactly what I was looking for. Thanks for the clarification!

Cheers,

Michael

frankkramer-lab / MIScnn

Questions regarding the data_augmentation module #23