Open Luciennnnnnn opened 1 month ago
Hi!
Both are mathematically valid. I have personally opted for the latter as it is easier to implement, and is fine as long as the OT is very fast relative to the model. I suspect that the former might be slightly better in some circumstances, but I don't think it should matter too much. To me it is the same question as to whether you should train your neural network by randomly sampling examples at each step, or pre-shuffle an epoch of data and run through the entire thing. Both can give satisfactory results.
Hi @atong01, thank you for your prompt response!
I believe the key consideration between these two options is data loading efficiency. The latter approach may significantly increase loading time. I'm curious why you suspect the former might be slightly better? From my perspective, if we employ replaceable sampling, both methods should yield similar results.
Yes, very possible. In this case I was assuming loading time was not a bottleneck. If it is than the former should be preferred. Definitely similar results. I think you would only see differences in big datasets with a small number of epochs (where you may miss some datapoints by chance with replaceable sampling.
Hi, how to sample from batch OT if batch size used for compute OT is large than training batch? For example, I use batch size 128 to compute batch OT, then should I sample 32 from this OT for training repeat 4 times or I sample from it only once?