AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
MIT License
2.09k stars 459 forks source link

gym-anytrading + DI-engine #75

Closed PaParaZz1 closed 1 year ago

PaParaZz1 commented 1 year ago

Nice project! I am the developer of DI-engine. We are looking for some RL environments for RL + trading, and we find your repo is suitable to make a demo for beginngers.

In last version updates, we modify gym-anytrading env and adapt DI-engine to it, here are our modification and experiment results (StockEnv + DQN). We also hope to add an example in gym-anytrading repo. What do you think about this idea, or any other thoughts?

AminHP commented 1 year ago

Hi @PaParaZz1, your work seems pretty interesting. People asked me many times to add some mid-level features. The gym-anytrading is a low-level and simple repo for beginners, while my gym-mtsim project is a high-level and complex tool for experts. Your project stays just in between and provides the requested mid-level features in this regard. You somehow implemented a more practical env on top of this one by improving the reward function. I didn't have enough time to read your work thoroughly, but I think it is valuable and I can put a link to your project in the README.md file.

About the example, I'm not sure if your example works here because you made some modifications that do not match my simple implementation. Can you explain more about it?

PaParaZz1 commented 1 year ago

Thank you for acknowledging our work. I will explain our modifications more clearly as follows:

Defects in original environment

For the original environment, the state machine can be described as such a state machine: init_state

And the reward function is: ori_rew

If and only when the position changes from long to short, a profit occurs. We can find that the reward of “Buy” is always zero whatever the position is. For Q-learning algorithm, it’s hard to estimate the Q value of action “Buy”. Actually, the reward function is relied on “position”, “action” as well as “last trade tick”. So, it’s reasonable to add these features into original state of environment. Otherwise, the reward function is unstable to the agent. The final state formula in our baseline is: state

Besides, the agent can not make profits by “selling short”.

Our Modifications

  1. add some features in state

    • add “position” and “volume”
    • add feature “last trade tick”, which can be represented as (curr_tick - last_trade_tick)/eps_length to record the time of last valid transaction.
  2. change the original operation logic of environment so that agent is able to make profits by using diverse strategies

    • change the state machine so that we can make profits by selling short s1

    • add action “Double_Sell” and “Double_Buy” so that the position between “Long” and “Short” can be transformed at one trading day. s2

  3. modify reward function

  4. modify DQN algorithm hyper-parameters

    • n-step DQN is better than 1-step to this case. We set n = 3.
    • In DQN algorithm, the signal “done” is ignored when updating Q function. We find it’s effective because the Q value of the last day at one episode should not be 0.

P.S. here is the readme about our modifactions.

PaParaZz1 commented 1 year ago

Do you have any other comments or ideas?

AminHP commented 1 year ago

Sorry for my late response, I was a bit busy. Interesting work! I will add a link to your repo soon.

PaParaZz1 commented 1 year ago

Thank you! Looking forward to bringing more interesting work to the open-source community.

AminHP commented 1 year ago

Just added a link to your repo: https://github.com/AminHP/gym-anytrading#related-projects. Keep up the good work! 🚀