Modified batch_iter function

JuanSapriza commented 1 year ago

For a while a had the same problem as Jakhongir's question in the forum: The implementation of batch_iter forces you to reshuffle if your choice of batch_size and num_batches is not happy (e.g. too large batches and/or too little batches and/or more batches than which fit inside the dataset).

In order to comply with the test from Project 1 (which has one single data point but requires 2 iterations) and use the batch_iter function you would need to do some inefficient nested for, shuffling the (single) data twice.

I propose this different approach (shuffling indexes instead of the data-points). It is compliant with the project tests and way more efficient than the previous one (given some reasonable circumstances). Furthermore, it allows to have as many iterations as desired, regardless of the dataset size.

The implementation has certain caveats regarding the randomness inside the batch, but the obvious workaround is using batch_size = 1, which should still be (slightly) more efficient.

Regards, thank you and keep up the awesome work!

martinjaggi commented 1 year ago

thanks a lot, we'll check it very soon

laraorlandic commented 1 year ago

I checked that the updated function still produces the desired behavior on lab 2 and the project 1 grading tests, and it all looks good! Great work, Juan!

epfml / ML_course

Modified batch_iter function #79