Changed policy pool's sample_idxs and (current) policies format (I checked that these are not used in clean_pufferl but please lmk if it breaks the existing works)
Fixed test for policy pool, deleted tests for policy store and ranker
Made policy selectors deterministic -- random_selector should be replaced
Added create_kernel helper function
I have not tested LSTM yet, but will probably test that soon.
random_selector
should be replacedcreate_kernel
helper functionI have not tested LSTM yet, but will probably test that soon.