AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.48k stars 2.31k forks source link

Using next day price array to predict actions is forbidden!! #966

Open valleysprings opened 1 year ago

valleysprings commented 1 year ago

image

I am very curious that why you are using next day price as agent state input (state update is positioned in the wrong location, it should be positioned before all those transactions). If my understanding is not wrong, you are using future price and tech indicator infomation (which agent should not see) to forecast its behaviours, and it's not true at all. You should put this state update code in the front of the step function to prevent this from happening.

Athe-kunal commented 1 year ago

Hi @valleysprings Probably the answer lies in how RL works. The agent takes an action at time step t and only sees the state st. The action outputs the next state which we get from the environment and is the next state st+1. This is part of the Bellman update, as the agent is trying to learn what action it should take so that it can maximize the reward moving forward. We do this sequentially, hence the agent takes an action at after only seeing st and then environment throws the agent to st+1 and it goes on till end of the episode. Now the agent looks back and update how good the trajectory was via bellman update. So, in conclusion the step function from the environment will only give you the next state after you take the action only after seeing the current state st, and there is no look-ahead bias If something is not clear or if I am missing something, let me know

valleysprings commented 1 year ago

Sorry, but I don't think you answer is right, or I miss your point maybe. The fact is self._update_state will update the state by replacing today's price information and tech indicators to the next day. And it will be sent to the self.state and pass it to the agent, then the agent will see the tomorrow's price information (you are saying this is normal and it's part of the bellman update) and I think if you are updating the state at this position, it will cause some kind of look-ahead bias. My point is that if you put self._update_state code to the first line of step function, you will update the state to achieve the same purpose without telling the agent some future information. I don't know if you get my point or not, but thank you for answering me.

JesseLT commented 1 year ago

The agent could see the next state is when function step(action) return the state, not when the self._update_state is executed.