godka / Pensieve-PPO

The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.
https://godka.github.io/Pensieve-PPO/
BSD 2-Clause "Simplified" License
65 stars 32 forks source link

What do the dimensions of the state returned by the environment mean? What are the corresponding parameters in the paper? #21

Closed 945716994 closed 2 years ago

945716994 commented 2 years ago

比如说state[0, -1]是last quality,对应论文中的哪个参数?是xt还是什么? image

godka commented 2 years ago

See here. https://github.com/godka/Pensieve-PPO/blob/7e795b25c1e739e299d328da7115aebc1074bcd7/src/env.py#L93

PS: please create an issue in English.