VDIGPKU / DADA

[ECCV 2020] DADA: Differentiable Automatic Data Augmentation
MIT License
188 stars 29 forks source link

About DifferentiableAugment #15

Open PushparajaMurugan opened 3 years ago

PushparajaMurugan commented 3 years ago

Hi,

I don't understand the DifferentiableAugment class in the implementation. What does it do? Just subtract and add magnitude with images?. Why you have adopted such as this? Is there any specific reason for it?

class DifferentiableAugment(nn.Module): def init(self, sub_policy): super(DifferentiableAugment, self).init() self.sub_policy = sub_policy

  def forward(self, origin_images, probability_b, magnitude):
      images = origin_images
      adds = 0
      for i in range(len(self.sub_policy)):
          if probability_b[i].item() != 0.0:
              images = images - magnitude[i]
              adds = adds + magnitude[i]
      images = images.detach() + adds
      return images
latstars commented 3 years ago

This implementation utilizes the straight-through gradient estimator, which is also utilized by GDAS.

PushparajaMurugan commented 3 years ago

@latstars . Thank you for your answer. I understand. I have another question. I tried to search the policies using larger ImageNet data where I used 500 classes and 30K+ images. The policies are so weird. But I observed that when I use a lower number of classes and images I get policies like what you have in your genotype.py. Maybe, it is not the right way to search the policy. But I want to know the reasons why my policies look like this'. If you have any ideas about what causes this kind of policies, pls share them with me,

The final policies are, DADA_policy

latstars commented 3 years ago

@latstars . Thank you for your answer. I understand. I have another question. I tried to search the policies using larger ImageNet data where I used 500 classes and 30K+ images. The policies are so weird. But I observed that when I use a lower number of classes and images I get policies like what you have in your genotype.py. Maybe, it is not the right way to search the policy. But I want to know the reasons why my policies look like this'. If you have any ideas about what causes this kind of policies, pls share them with me,

The final policies are, DADA_policy

Hi, Raja. Maybe you use a larger dataset, then the iteration is more than the small dataset. However, more iteration will lead to over-optimize, since there is no l2-normalization (weight-decay). I suggest that you can try to search the policy with less epoch or smaller learning rate for augmentation parameters.

PushparajaMurugan commented 3 years ago

@latstars . Hi. Thank you for your answer. This is impressive. When I reduced the learning rate the policies seem good. In order to validate it, I have conducted several experiments by varying learning rates from 0.5 to 0.0001 where 0.5,0.2,0.1,0.002,0.005 are not working well. But, 0.001, 0.0001,0.0002 are fine (Working fine means that the policies look like your experiment policies --> Just a visual inspection). Now, I'm facing another question. Which policy is good.? How can I determine the policies are good.? Any ideas you can give?

The following result is an example of experiments with a learning rate of 0.0001.

DADA_Policy_Crt