Closed turmeric-blend closed 3 years ago
Hi @turmeric-blend, good question. In short, it might make a little bit of difference, but I would expect any improvement to be small. We're sampling the biases from the 'first' ~4K examples (note that the examples should be in random order, so it's ~4K randomly-sampled examples). I think in a lot of cases using more examples just won't make that much difference. The examles used for the biases just need to be representative (of the overall training set), and ~4K examples are going to be sufficiently representative in many cases. Of course, in some cases this may not be true, and it might be worth using a bigger sample.
It may be worth trying if you think it might make a difference for your application. I note that this code has not been heavily 'tuned', and it may be possible to improve performance by making some changes, like you are suggesting.
Same deal with normalisation. Actually, I'd be very surprised if computing the mean or standard deviation over additional examples made any real difference. With ~4K examples, I think the sample mean and standard deviation are likely to be very, very close to the mean and standard deviation of the overall training set.
There's also another minor point, in that aggresively optimising validation performance may not make much difference anyway. It might be worth just increasing the size of the validation set, if your dataset has plenty of data so that it makes sense to do this.
I see, what you said make sense. Thank you for the quick response :)
hi, for
softmax.py
, if the data is split into multiple chunks, thenX_validation
is only transformed for the first's chunkbiases
, asbiases
for different chunks are different, but thetransform
is only applied once.would transforming the
X_validation
with each chunk's biases improve performance?EDIT:
similarly for the latter part (where
X_validation_transform
is only normalised with mean and std values from the first chunk):