AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
MIT License
2.1k stars 465 forks source link

Clarifications on Frequency of Algo trades #47

Closed QuantumCrazy closed 3 years ago

QuantumCrazy commented 3 years ago

explain

I am playing around with some financial data and testing various models, and I'd like to understand some things:

1) Firstly, is there a way to plot all the training buy/sell positions against the prices the bot bought and sold at? I can only see reward over time plotted.

2) In the attachment, can you explain why the reward jumps up from 378 to 379 on the x-axis when it goes from selling a low price to buying a high price? Or is price info hidden and the bot is actually buying a low price and selling a high price in the background?

3) My biggest challenge is how often should the bot run because every time it is run, it generates a signal. So if I run it every day, it will generate a signal very day, if I run it every 4 days, it will generate a signal every 4 days etc.....it basically generates a signal based on the timesteps of your dataset. So if you have hourly data, is it best practice to run it every hour, and if you have daily data, is it best practice to run it every day? How can we change the reward function to penalize very short term trades and make it trade every once in a while?

AminHP commented 3 years ago
  1. You can call the render method of the training env, and it will plot the training positions. Moreover, you can get the required information from the history attribute of the env.

  2. This plot shows positions, not actions. The position at 379 is Long, so it means the agent made a Buy action at 378.

  3. I can't say if it is a best practice or not. But I think it is better to trade based on the exact timestamps in the training dataframe. For example, if your training dataframe has hourly data of these 9:30, 10:30, 11:30 points, you should trade at these timestamps in the test data. You can give a negative reward based on the difference between _last_trade_tick and _current_tick attributes.