Open learningyan opened 3 years ago
Hey, @learningyan
AutoAlbument doesn't support DistributedDataParallel for now, but it is in my roadmap, and I plan to add it in the next few months.
As for loss, now I am creating a benchmark for AutoAlbument on multiple datasets for classification and segmentation. When this benchmark if finished, I can share more intuition behind loss values and their meaning, but for now, here is my experience with loss based on running AutoAlbument on multiple datasets:
a_loss
is a loss for the policy network (or Generator in terms of GAN), the network that applies augmentations to input images. d_loss is a loss for the Discriminator, the network that tries to guess whether the input image is an augmented or non-augmented one.
lossis a task-specific loss (
CrossEntropyLossfor classification,
BCEWithLogitsLoss` for semantic segmentation) that acts as a regularizer and prevents the policy network from applying such augmentations that will make an object with class A looks like an object with class B.a_loss
and d_loss
could increase or decrease, and that's ok. The only problematic option is when a_loss
always increases and never decreases after each batch, and d_loss
always decreases and never increases after each batch. That means that somehow Discriminator is only getting better and better at each step, and the Policy Network couldn't produce augmented images that could fool Discriminator.@creafz thanks a lot for this writeup. I'd love to hear more intuition behind losses and assessing the quality of AutoAlbument e.g. to understand at least initially whether the training was successful. In some of my initial experiments d_loss
is pretty stable, although the value range is massive (e.g. between e-8 to e+8). Meanwhile, a_loss
always decreases or always increases, going into e+9 / e-9 values a few epochs into the training.
Hey, @jwitos
Now I am finishing AutoAlbument experiments with datasets such as CIFAR10, ImageNet, and Pascal VOC. I am planning to add a description of those experiments and loss values to the documentation.
Briefly speaking, I think that the only representative metric for the quality of AutoAlbument training is "Average Parameter change" (that is, how much augmentation parameters changed at the end of the epoch compared to the beginning of the epoch). This metric should decrease and then plateau on some value. But I think that this metric is heavily dependent on the size of a dataset, and if the dataset is small, it can be very noisy.
Here are, for example, Tensorboard logs for one of my CIFAR10 experiments - https://tensorboard.dev/experiment/hpqoQQEATAy9XhpDbvKSKA/#scalars&_smoothingWeight=0. "Average Parameter change" decreasing at the end of the training, while a_loss and d_loss are increasing.
@jwitos I have added TensorBoard logs for AutoAlbument configs from the examples
directory. Hope that helps - https://albumentations.ai/docs/autoalbument/metrics/
A few advice I could give:
@creafz: If I want to extend the base code for multiple gpu processing, where should I start? Also, can you help reupload Tensorboard logs for the CIFAR10, ImageNet, and Pascal VOC since the TensorBoard.dev service has been closed. Many thanks.
Hi! Does this codebase support the DistributedDataParallel now? Besides, when I try to search in my dataset, the loss is increasing. The same config format as provided examples, whats's wrong?