GeorgeCazenavette / mtt-distillation

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
https://georgecazenavette.github.io/mtt-distillation/
Other
395 stars 55 forks source link

about hyperparameter: learning rate about updating condenses samples #3

Closed Alan-Qin closed 2 years ago

Alan-Qin commented 2 years ago

Hello, George! First of all, I must say that this is very nice work.

I have some doubts about the used hyperparameter lr_img for updating condenses samples. It is not mentioned how to choose lr_img in the paper. Besides, I conduct the experiment about 10 images about each class for CIFAR-10 in terms of Table 6 and only obtain 58.50% accuracy. Should I modify other hyperparameters?

GeorgeCazenavette commented 2 years ago

Glad you like the work!

I double checked, and we used lr_img=1000 and lr_lr=1e-4 for that experiment.

I think the issue is likely the initialization. As of now, the default initialization in the code base is "noise" instead of "real." We ran all of our experiments with "real" initialization. I'll go ahead and make that change now.

Thanks for pointing out that the learning rates are missing from the paper. We'll include them in an updated arxiv version.

Also, if you're using wandb and you post your raw config, I can double check that all the options matched ours.

Let me know if you have any other questions!

GeorgeCazenavette commented 2 years ago

The good news is that you don't have to re-train new experts in order to run the distillation experiment again :)

Alan-Qin commented 2 years ago

Thanks, George! It should attribute to the condensed image initialization problem.

Thanks, again!

GeorgeCazenavette commented 2 years ago

Just curious, did you try running the experiment again with the new setting?

Alan-Qin commented 2 years ago

yes! good performance!

Thanks, George!

When we choose the init as 'random', we may set lr_lr as 1e-5. Then we can get more consistent/stable results.

^_^