Open PushparajaMurugan opened 3 years ago
This implementation utilizes the straight-through gradient estimator, which is also utilized by GDAS.
@latstars . Thank you for your answer. I understand. I have another question. I tried to search the policies using larger ImageNet data where I used 500 classes and 30K+ images. The policies are so weird. But I observed that when I use a lower number of classes and images I get policies like what you have in your genotype.py. Maybe, it is not the right way to search the policy. But I want to know the reasons why my policies look like this'. If you have any ideas about what causes this kind of policies, pls share them with me,
The final policies are,
@latstars . Thank you for your answer. I understand. I have another question. I tried to search the policies using larger ImageNet data where I used 500 classes and 30K+ images. The policies are so weird. But I observed that when I use a lower number of classes and images I get policies like what you have in your genotype.py. Maybe, it is not the right way to search the policy. But I want to know the reasons why my policies look like this'. If you have any ideas about what causes this kind of policies, pls share them with me,
The final policies are,
Hi, Raja. Maybe you use a larger dataset, then the iteration is more than the small dataset. However, more iteration will lead to over-optimize, since there is no l2-normalization (weight-decay). I suggest that you can try to search the policy with less epoch or smaller learning rate for augmentation parameters.
@latstars . Hi. Thank you for your answer. This is impressive. When I reduced the learning rate the policies seem good. In order to validate it, I have conducted several experiments by varying learning rates from 0.5 to 0.0001 where 0.5,0.2,0.1,0.002,0.005 are not working well. But, 0.001, 0.0001,0.0002 are fine (Working fine means that the policies look like your experiment policies --> Just a visual inspection). Now, I'm facing another question. Which policy is good.? How can I determine the policies are good.? Any ideas you can give?
The following result is an example of experiments with a learning rate of 0.0001.
Hi,
I don't understand the DifferentiableAugment class in the implementation. What does it do? Just subtract and add magnitude with images?. Why you have adopted such as this? Is there any specific reason for it?
class DifferentiableAugment(nn.Module): def init(self, sub_policy): super(DifferentiableAugment, self).init() self.sub_policy = sub_policy