frederick0329 / TracIn

Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Apache License 2.0
219 stars 15 forks source link

Efficint subset selection #7

Closed diff7 closed 3 years ago

diff7 commented 3 years ago

Hi, thank you for a great work.

I was wondering. Have you tried using influence scores to select most influential data points to reduce training set?

Thank you!

frederick0329 commented 3 years ago

No. We did try removing ~5% high-influential (high self influence, potentially mislabeled) data hoping to remove mislabeled data and it didn't hurt performance much (not cleaning the eval set). However, we were looking for an improvement.

diff7 commented 3 years ago

Hi, got it!

Thank you for your reply!