Quick Question regarding class index mapping for the custom imagenet subsets

GeorgeCazenavette / mtt-distillation

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

https://georgecazenavette.github.io/mtt-distillation/

Other

395 stars 55 forks source link

Quick Question regarding class index mapping for the custom imagenet subsets #41

Closed meghbhalerao closed 1 year ago

meghbhalerao commented 1 year ago

Hello and thank you for the paper and also thank you for open sourcing the code.

I have a question in this line - https://github.com/GeorgeCazenavette/mtt-distillation/blob/main/utils.py#L326

Why is the class index mapping done only for when mode != 'train' - should it not be done always irrespective of whether the mode is train or test?

Please do let me know if I am missing anything and thank you for your time!

Megh

GeorgeCazenavette commented 1 year ago

Hello :)

Firstly, this is only really an issue when you're using a subset of a dataset that has predefined classes (like our imagenet subsets) such that the original labels [0..999] need to be re-mapped to reflect the subset [0..10].

If I remember correctly, this was because the training data have the correct labels assigned to them already.

The validation loader was just made directly from a filtered version of the original validation dataset, to the labels need to be remapped.

This was just a quick hack to make it work, and I never got around to cleaning it up. This could likely easily be done when initially creating the validation loader.

Hope this helps!

meghbhalerao commented 1 year ago

Thank you, I think this makes sense! I was just a little confused initially since this part of the code - https://github.com/GeorgeCazenavette/mtt-distillation/blob/main/utils.py#L104 - created the subsets of imagenet using the Subset class which seems to be filtering based on the original dataset, hence I was a little confused. But nonetheless I understand your point.

GeorgeCazenavette commented 1 year ago

You're right, but the training set get re-mapped here (and likewise in buffer.py): https://github.com/GeorgeCazenavette/mtt-distillation/blob/c365f4257117ccc0abf163e09f692be94634ed18/distill.py#L88

It's a bit of a mess 🥲

The whole codebase could definitely use a big refactor.

meghbhalerao commented 1 year ago

Ah, okay! That clarifies it! Thank you!