Closed roma-glushko closed 3 years ago
Do you observe same behavior when not using any augmentations? PS: usually you don’t want to apply augmentations at validation stage PPS: Pytorch is better
Вс, 23 мая 2021 г. в 12:30, Roman Glushko @.***>:
🐛 Bug
I could get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread #93 https://github.com/albumentations-team/albumentations/issues/93 and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:
import os import random
import numpy as np import tensorflow as tf
def set_random_seed(seed: int = 42):
""" Globally fix all possible sources of randomness to keep experiment reproducible """ random.seed(seed) np.random.seed(seed) tf.random.set_seed(seed) os.environ['PYTHONHASHSEED'] = str(seed) os.environ['TF_DETERMINISTIC_OPS'] = '1' os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
Unfortunately, this doesn't help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see it in W&B:
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2bdgnbwx (best_val_acc: 0.7104, best_epoch: 3)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2qo9pbls (best_val_acc: 0.7875, best_epoch: 8)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/uf6cknge (best_val_acc: 0.6771, best_epoch: 8)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/tem3umbx (best_val_acc: 0.7729, best_epoch: 6)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/czsjm7px (best_val_acc: 0.7208, best_epochs: 0 and 8)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/29dif98z (best_val_acc: 0.8, best_epoch: 9)
[image: Screenshot 2021-05-23 at 12 29 29] https://user-images.githubusercontent.com/9402690/119255115-98690100-bbc2-11eb-90c9-6c591dbfe629.png
Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.
However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.
Any clues what's wrong here? To Reproduce
Steps to reproduce the behavior:
- Clone the project state at 0.1.0-bugrep tag:
git clone --depth 1 --branch 0.1.0-bugrep https://github.com/roma-glushko/rock-paper-scissor
- Pull dataset:
cd data
kaggle datasets download --unzip frtgnn/rock-paper-scissor
- Install project deps:
poetry install
1.
Uncomment any of the reported augmentations in the config file (they are all commented out in the git):
https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py 2.
Run training a couple of times and you get results that differs by a lot:
python train.py
Expected behavior
In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible. Environment
- Albumentations version (e.g., 0.1.8): 0.5.2
- Python version (e.g., 3.7): 3.8.6
- OS (e.g., Linux): Ubuntu 20.10
- How you installed albumentations (conda, pip, source): poetry (pip-like)
- tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))
Additional context
This report is reproduced in a project that is also mentioned in #905 https://github.com/albumentations-team/albumentations/issues/905
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/albumentations-team/albumentations/issues/906, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YC6NZF4H4ROOKQT5E3TPDDMBANCNFSM45LQYTDQ .
@BloodAxe thank you for the replay!
Do you observe same behavior when not using any augmentations?
No, when I disable augmentations, the pipeline becomes deterministic in my experiments. It overfits and do it in the same way every time I rerun it (each epoch stats looks the same)
PS: usually you don’t want to apply augmentations at validation stage
Will try to disable augmentation for the validation step. Let you know how it went.
PPS: Pytorch is better
I know, I know 😌 In this particular project I use TF because of TF.js. I want to deploy my model as a server-less webapp.
@BloodAxe, back to your suggestions: Here is 6 train runs with validation augmentation disabled (6 runs are shown):
Don't see a lot of differences to the previous runs with val augmentation enabled.
Here is how training run looks like when I disable albumentations completely (7 runs are shown):
On the last plot, this is what I consider to be a reproducible pipeline: all metrics are the same/close at all epochs.
@roma-glushko may I ask for another trial? What if you fix a seed inside the apply_augmentation
function? That is to ensure tf.numpy_function
does not introduce any unexpected issues with pseudorandom generator. This test will apply exactly same set of augmnetations, and results should be identical.
@BloodAxe I saw this usage in the examples associated with TF usage, so I tried that even before creating the ticket. However, I have just double checked that and still see the same undeterministic picture in W&B:
Just for the record, the function was modified this way:
def augment_image(inputs, labels, augmentation_pipeline: a.Compose, seed: int = 42):
def apply_augmentation(images):
random.seed(seed) # fixing seed
aug_data = augmentation_pipeline(image=images.astype('uint8'))
return aug_data['image']
inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
return inputs, labels
Try also to set a numpy.random.seed(seed)
.
@Dipet It has been already enabled in the entry point from the very beginning as I mentioned in the ticket, so I have tried to add the line to augment_image()
function (which I had not checked before):
def augment_image(inputs, labels, augmentation_pipeline: a.Compose, seed: int = 42):
def apply_augmentation(images):
random.seed(seed)
np.random.seed(seed)
aug_data = augmentation_pipeline(image=images.astype('uint8'))
return aug_data['image']
inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
return inputs, labels
Unfortunately, 5 additional runs show that the picture has not changed much:
Very strange. There are only 2 things in the library that control randomness. Could you describe which transforms do you use?
@Dipet sure, all tests were performed with the following configuration of augmentation pipeline:
args['train_augmentation'] = a.Compose([
a.VerticalFlip(),
a.HorizontalFlip(),
a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False),
a.CoarseDropout(max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8),
a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)),
])
args['validation_augmentation'] = a.Compose([])
I kept validation step augmentation-free as @BloodAxe suggested above.
Hmm. All of a sudden, this issue starts looking more interesting than at the beginning.
Чт, 27 мая 2021 г. в 11:57, Roman Glushko @.***>:
@Dipet https://github.com/Dipet sure, all tests were performed with the following configuration of augmentation pipeline:
args['train_augmentation'] = a.Compose([ a.VerticalFlip(), a.HorizontalFlip(), a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False), a.CoarseDropout(max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8), a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)), ]) args['validation_augmentation'] = a.Compose([])
I kept validation step augmentation-free as @BloodAxe https://github.com/BloodAxe suggested above.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/albumentations-team/albumentations/issues/906#issuecomment-849462738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YDBTXNOLU5VJ6VVWLTTPYCPFANCNFSM45LQYTDQ .
As another check you could use ReplayCompose
and serialize all applied arguments. After that you could rerun and check if all arguments the same or not.
@Dipet I have noticed an interesting thing. After I enabled ReplayMode, I have started to see less variance in the training loss/accuracy, but validation metrics still vary by a lot:
In addition, there were new warnings related to GaussNoise and CoarseDropout augmentations:
Epoch 1/10
UserWarning: albumentations.augmentations.transforms.GaussNoise could work incorrectly in ReplayMode for other input data because its' params depend on targets.
warn(
46/63 [====================>.........] - ETA: 1s - loss: 1.2924 - accuracy: 0.3268
UserWarning: albumentations.augmentations.transforms.CoarseDropout could work incorrectly in ReplayMode for other input data because its' params depend on targets.
63/63 [==============================] - 9s 86ms/step - loss: 1.2859 - accuracy: 0.3304 - val_loss: 1.1152 - val_accuracy: 0.3417
...
So the only changes I have done was:
args['train_augmentation'] = a.ReplayCompose([ # ReplayCompose() replaced Compose() method
a.VerticalFlip(),
a.HorizontalFlip(),
a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False),
a.CoarseDropout(max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8),
a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)),
])
Warnings are ok. I talked about saving applied arguments. Something like this:
applied_transforms = []
def augment_image(inputs, labels, augmentation_pipeline: a.Compose, seed: int = 42):
def apply_augmentation(images):
random.seed(seed)
np.random.seed(seed)
aug_data = augmentation_pipeline(image=images.astype('uint8'))
applied_transforms.append(data['replay'])
return aug_data['image']
inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
return inputs, labels
# train
....
# save after train
with open('data.pickle', 'wb') as f:
pickle.dump(applied_transforms, f)
And after that we could compare applied arguments and transforms.
@Dipet yeah, just was in process of collecting the information. Although, I decided to store each augmentation run in a different pkl file:
def augment_image(inputs, labels, augmentation_pipeline: a.Compose, seed: int = 42):
def apply_augmentation(images):
random.seed(seed)
np.random.seed(seed)
aug_data = augmentation_pipeline(image=images.astype('uint8'))
with open(f'logs/debug/replay-{datetime.datetime.now().timestamp()}.pkl', 'wb') as outfile:
pickle.dump(aug_data['replay'], outfile)
return aug_data['image']
inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
return inputs, labels
I hope you are okay with that.
Here is a zip archive with a few files generated by snippet above:
https://drive.google.com/file/d/1lH-YuY4abcVYk12cCwXJm5PAAUd1kXS5/view?usp=sharing
@Dipet have you had a chance to open the replay "black box" of the albumentations I have shared with you? 😄
Oh, sorry. Looks like some of files are corrupted (has 0 size). And if we talk about reproducibility, it would be great to have 2 groups of files from 2 independent runs. Looks like you had a problems with trying to process batches inside albumentations pipeline. Have you tried to reproduce the results after fixing this issue?
@Dipet glad you wrote back 🙌
I think the fix from #911 greatly mitigated the variance of the metrics. Here is what I can see now:
Currently, losses and accuracies roughly vary by +-0.01. Is this something we expect to see?
Looks good. I think current differences associated with the instability of algorithms and hardware.
@Dipet I believe so. At least, I have no augmentations on the validation step, so it seems nothing to do with albumentations. In any case, thank you for the support! Appreciate your help ❤️
🐛 Bug
I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread https://github.com/albumentations-team/albumentations/issues/93 and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:
Unfortunately, this doesn't help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see the whole picture in W&B:
Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.
However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.
Any clues what's wrong here?
To Reproduce
Steps to reproduce the behavior:
Clone the project state at
0.1.0-bugrep
tag:Pull dataset:
Install project deps:
Uncomment any of the reported augmentations in the config file (they are all commented out in the git): https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py
Run training a couple of times and you get results that differs by a lot:
Expected behavior
In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible.
Environment
conda
,pip
, source): poetry (pip-like)Additional context
This report is reproduced in a project that is also mentioned in https://github.com/albumentations-team/albumentations/issues/905
The data pipeline is the same for both issues: