Kaszanas / SC2_Datasets

https://sc2-datasets.readthedocs.io/
GNU General Public License v3.0
8 stars 3 forks source link

PyTorch and PyTorch Lightning abstractions #3

Closed Kaszanas closed 2 years ago

Kaszanas commented 2 years ago

Currently the files:

Handle a multiple replays but do not go deeply into the underlying data structure of the replay. Some research needs to be done on the API that is provided by PyTorch and PyTorch Lightning to see how deeply nested datasets we have in reality.

The compositional structure of our dataset can be viewed as follows:

  1. Whole dataset consists of multiple replaypacks. (Dataset or IterableDataset?)
  2. A replaypack consists helper (log and mapping) files and multiple replays. (Dataset or IterableDataset?)
  3. A replay consists of keys and values, where some of the values for a given key may be a sequence of events. (Nested IterableDatasets?)

Later DataLoader needs to be defined for our data.

Reference: https://pytorch-lightning.readthedocs.io/en/stable/guides/data.html?highlight=Data

Any thoughts on that? @leafnode

Kaszanas commented 2 years ago

This was solved around this commit: https://github.com/Kaszanas/SC2EGSet_Experiments/commit/77552528ad113d510e89d42042f6af872dd6d006

No IterableDataset was included for the sequences of events such as PlayerStats. This requires further research and potentially implementation of more fields within SC2ReplayData that would contain events of the same type contained within IterableDatasets, or some other form of Iterable

Kaszanas commented 2 years ago

This was done and mostly tested.