DeepLearnPhysics / larcv2

MIT License
13 stars 16 forks source link

Random access in ThreadProcessor #15

Closed drinkingkazu closed 6 years ago

drinkingkazu commented 6 years ago

ThreadProcessor inherits (English inherits, not C++) "RandomAccess" parameter from ProcessDriver which randomize the access order of events in the input file. This is done in a very simple way: use std::shuffle for the vector of all event entry numbers.

Also, though it is not well known, RandomAccess is a boolean OR integers. When it is set to True, it uses unix microseconds (time) seed. If it is set to False or <=0 integers, it does not randomize. If it is set to positive integers, that is used as a random seed.

Issues

Using the same feature for ThreadProcessor concerns two issues.

  1. ThreadProcessor is used for network training = process same events many times. Even though the order is randomized, the "randomized order" that is set @ process start is never randomized again. (i.e. not really "random").

  2. Randomizing the access order slows down ROOT file IO compared to accessing in an order.

Proposal

Here's a proposal to mitigate above issues.

@twongjirad and @coreyjadams (and anyone who cares!) please explicitly comment. I will implement this today if no opposition.

drinkingkazu commented 6 years ago

Implemented https://github.com/DeepLearnPhysics/larcv2/commit/0fc81cbe5954c9ab19e14e0496a3c1acb0483caf. @twongjirad agreed w/ a change on skype :turkey:

To be clear, false is equivalent to 0 (no randomization), and true = 1 (complete randomization).

What I recommend to use is 2, which won't make complete random set of events, but it a) changes event sets and b) IO faster than complete random approach.

coreyjadams commented 6 years ago

Based on skype discussion with @drinkingkazu, I think this is a good idea.

drinkingkazu commented 6 years ago

Thanks @coreyjadams 🍶

twongjirad commented 6 years ago

:+1: on emoji use