decile-team / cords

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.
https://cords.readthedocs.io/en/latest/
MIT License
316 stars 53 forks source link

Models and Examples for Tabular Data #82

Open simonamaggio opened 1 year ago

simonamaggio commented 1 year ago

Hi, I'm interested in using cords for tabular data and I noticed a planned work in this page, but which has not been realized yet. Is there any plan regarding this? How could I contribute? Thanks!

krishnatejakk commented 1 year ago

@simonamaggio Thanks for letting us know your interest to add more examples on tabular data. We are integrating more subset selection strategies to CORDS repository which we feel would be very effective for tabular data. Let me get back to you after we complete the integration of these new subset selection strategies.

At least my current plan is to have: a) Integrate multiple representative tabular datasets to CORDS repository. b) Set up benchmark notebooks comparing the effectiveness of the existing subset selection strategies for efficient training on tabular datasets. c) Perform a benchmark analysis on tabular datasets and make it available for others.