amiratag / DataShapley

Data Shapley: Equitable Valuation of Data for Machine Learning
MIT License
255 stars 66 forks source link

Running TMC-Shapley for CNN #15

Closed ffeiland closed 4 years ago

ffeiland commented 4 years ago

Hi there,

I try to run the TMC-Shapley on the Chexpert dataset for a DenseNet-121. I only want to calculate the TMC-Shapley on a small number of training points. The code seems to run fine without any errors. Still I have a problem with the following piece of code in the _one_iteration method:

X_batch = np.zeros((0,) + tuple(self.X.shape[1:]))
y_batch = np.zeros(0, int)
truncation_counter = 0 for n, idx in enumerate(idxs): old_score = new_score X_batch = np.concatenate((X_batch, self.X[sources[idx]])) y_batch = np.concatenate((y_batch, self.y[sources[idx]])) if (self.is_regression or len(set(y_batch)) == len(set(self.y_test))): ##FIXIT self.restart_model() self.model.fit(X_batch, y_batch) new_score = self.value(self.model, metric=self.metric)

In case the if-condition is false, the TMC-Shapley calculations terminate but every data point has marginal contribution of zero. I guess this is not the itention of the calculations. In case the if-condition is true, the TMC-Shapley calculations do not terminate. My X data comes in the shape of numberx3x224x224 and my y data in the shape of numberx1. Thus, the only thing I changed from the original source code is that is replaced the y_batch = np.zeros(0, int) by y_batch = np.zeros((0,) + tuple(self.y.shape[1:])).

Is there in general a problem with TMC-Shapley for CNNs or did I make a mistake?

Thanks for the help in advance.

Best regards, Fabian

ffeiland commented 4 years ago

Ok I am sorry, it works just fine but only for 10 training points. I tried it for 100 training points, but this already seemed to be too many.