evolve() support batching

This is necessary for input larger than gpu memory size and be able to do multi-gpu and TPU. This involve batching the tensor into smaller pieces (e.g number of GPU) and changing the selection code so it merges the fitness values and do the selection (probably has to be in CPU to avoid memory blew up)

google-research / evoflow

evolve() support batching #42