amiratag / DataShapley

Data Shapley: Equitable Valuation of Data for Machine Learning
MIT License
255 stars 66 forks source link

dividing by len(sources[idx]) #19

Open carloalbertobono opened 1 week ago

carloalbertobono commented 1 week ago

Hello,

I am really fascinated by the concept, and trying to apply it in emergency scenarios (social media).

I can't understand one detail of the implementation (377 in Dshap.py): https://github.com/amiratag/DataShapley/blob/303d91d988a149948fb357ac82dc72af1bc7430d/DShap.py#L377

How dividing by the length of an index is beneficial? Was it meant to support multiple indexes and then the implementation changed?

Sorry if I am missing something obvious

Thanks for the work!