amiratag / DataShapley

Data Shapley: Equitable Valuation of Data for Machine Learning
MIT License
255 stars 66 forks source link

It's super slow. #6

Closed Bee-zest closed 4 years ago

Bee-zest commented 4 years ago

Great idea. However, I tried on 10k rows and 80 columns data with my 32 cores machine. It's keep running for 5 days.

tabularML commented 4 years ago

Hi Bee, so I need to know more about your problem. What is the model you are using and which library? Also how many iterations have you planned on running the algorithm? With 10k data points somewhere around 10000 iterations should be enough. Also if you have a 32 core machine you are better off by running 32 parallel jobs rather than running one.