X_validation not transformed properly?

turmeric-blend commented 3 years ago

hi, for softmax.py, if the data is split into multiple chunks, then X_validation is only transformed for the first's chunk biases, as biases for different chunks are different, but the transform is only applied once.

if epoch == 0 and chunk_index == 0: # only run once <---

   parameters = fit(X_training, args["num_features"]) # returns: dilations, num_features_per_dilation, biases

   # transform validation data
   X_validation_transform = transform(X_validation, parameters)

would transforming the X_validation with each chunk's biases improve performance?

EDIT:

similarly for the latter part (where X_validation_transform is only normalised with mean and std values from the first chunk):

if epoch == 0 and chunk_index == 0:

                    # per-feature mean and standard deviation
                    f_mean = X_training_transform.mean(0)
                    f_std = X_training_transform.std(0) + 1e-8

                    # normalise validation features
                    X_validation_transform = (X_validation_transform - f_mean) / f_std
                    X_validation_transform = torch.FloatTensor(X_validation_transform)

angus924 commented 3 years ago

Hi @turmeric-blend, good question. In short, it might make a little bit of difference, but I would expect any improvement to be small. We're sampling the biases from the 'first' ~4K examples (note that the examples should be in random order, so it's ~4K randomly-sampled examples). I think in a lot of cases using more examples just won't make that much difference. The examles used for the biases just need to be representative (of the overall training set), and ~4K examples are going to be sufficiently representative in many cases. Of course, in some cases this may not be true, and it might be worth using a bigger sample.

It may be worth trying if you think it might make a difference for your application. I note that this code has not been heavily 'tuned', and it may be possible to improve performance by making some changes, like you are suggesting.

Same deal with normalisation. Actually, I'd be very surprised if computing the mean or standard deviation over additional examples made any real difference. With ~4K examples, I think the sample mean and standard deviation are likely to be very, very close to the mean and standard deviation of the overall training set.

There's also another minor point, in that aggresively optimising validation performance may not make much difference anyway. It might be worth just increasing the size of the validation set, if your dataset has plenty of data so that it makes sense to do this.

turmeric-blend commented 3 years ago

I see, what you said make sense. Thank you for the quick response :)

angus924 / minirocket

X_validation not transformed properly? #7