Closed drinkingkazu closed 6 years ago
Implemented https://github.com/DeepLearnPhysics/larcv2/commit/0fc81cbe5954c9ab19e14e0496a3c1acb0483caf. @twongjirad agreed w/ a change on skype :turkey:
To be clear, false is equivalent to 0 (no randomization), and true = 1 (complete randomization).
What I recommend to use is 2, which won't make complete random set of events, but it a) changes event sets and b) IO faster than complete random approach.
Based on skype discussion with @drinkingkazu, I think this is a good idea.
Thanks @coreyjadams 🍶
:+1: on emoji use
ThreadProcessor inherits (English inherits, not C++) "RandomAccess" parameter from ProcessDriver which randomize the access order of events in the input file. This is done in a very simple way: use std::shuffle for the vector of all event entry numbers.
Also, though it is not well known, RandomAccess is a boolean OR integers. When it is set to True, it uses unix microseconds (time) seed. If it is set to False or <=0 integers, it does not randomize. If it is set to positive integers, that is used as a random seed.
Issues
Using the same feature for ThreadProcessor concerns two issues.
ThreadProcessor is used for network training = process same events many times. Even though the order is randomized, the "randomized order" that is set @ process start is never randomized again. (i.e. not really "random").
Randomizing the access order slows down ROOT file IO compared to accessing in an order.
Proposal
Here's a proposal to mitigate above issues.
RandomAccess to take True/False or integers [0, 1, 2] (internally it will be enum). 0 = no randomization. 1 = read a random set of entries determined by drawing random numbers between 0 to (N-1) where N is number of total entries (random number generation per request of events = batch). 2 = generate one random number X per request, read event entries X to (X+M-1) where M is requested number of events (hence a set of events read-in is sequential).
Introduce RandomSeed parameter. 0 = time seed (default), and positive integers are used as a seed value.
@twongjirad and @coreyjadams (and anyone who cares!) please explicitly comment. I will implement this today if no opposition.