Policy pool code/test changes

Changed policy pool's sample_idxs and (current) policies format (I checked that these are not used in clean_pufferl but please lmk if it breaks the existing works)
Fixed test for policy pool, deleted tests for policy store and ranker
Made policy selectors deterministic -- random_selector should be replaced
Added create_kernel helper function

I have not tested LSTM yet, but will probably test that soon.

PufferAI / PufferLib