decile-team / cords

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.
https://cords.readthedocs.io/en/latest/
MIT License
316 stars 53 forks source link

Inquiry about performance of gradmatch #81

Closed pipilurj closed 1 year ago

pipilurj commented 1 year ago

Hello, I ran some experiments with gradmatch and randomonline, and find these two actually reach similar performances after 300 epochs, which is around 93, is there something important to note for reproducing the results? Thanks for your help!

krishnatejakk commented 1 year ago

@pipilurj This is because RandomOnline can be a very strong baseline for some datasets and can be difficult to beat in terms of efficiency as GradMatch, CRAIG subset selection steps are computationally expensive. We have explained this issue in more detail and proposed a new subset selection approach that performs even better in the following pre-print: “MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning”.

I am closing this issue now but do let me know if you have more questions that need to be addressed.