ManifoldRG / NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
https://discord.gg/brsPnzNd8h
GNU General Public License v3.0
46 stars 10 forks source link

Atari Datasets #12

Open daniellawson9999 opened 1 year ago

daniellawson9999 commented 1 year ago

This issue does not go into all the detail regarding our dataset considerations, but I am currently converting datasets for 45 Atari games to Minari. I utilize dqn-replay, which in its entirety has 50 million transitions for each game, repeated over 5 seeds.

We utilize a filtered version of what is called near-optimal data from Scaled-QL (Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes), which creates a training set consisting of 50 mil transitions for 40 games while choosing another 5 games and their respective 50 mil transition datasets used for testing transfer/generalization.

We filter by taking the top 1% of trajectories per game, which creates a variable amount of transitions per game, as some games have longer time horizons. This filtering is currently being performed, using https://github.com/daniellawson9999/data-tests/blob/main/atari_minari/download_convert.py. This dataset is technically not the top 1% of trajectories, but should be a good proxy. This is because we filter using the return composed of the sum of clipped rewards, rather then the true score comprising of the sum of unclipped rewards, which is not directly provided in the dataset. In the future, we may explore recovering the unclipped rewards. Because the number of timesteps is variable, exact size in Minari is variable, but should be <= 300gb. Should have a concrete number in <= 1 day, when conversion should be done; it is a bit slow due to using an external HD.

Bellow, we visualize the return distributions, where red corresponds to 90th percentile and green corresponds to the 99th percentile: distribs

Issue will be updated when conversion is complete.

daniellawson9999 commented 1 year ago

todo: redo breakout training but with new dataset