Cash penalty & Scaling on the state

AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥

https://ai4finance.org

MIT License

10.06k stars 2.42k forks source link

Cash penalty & Scaling on the state #325

Closed sharvit-kesem closed 2 years ago

sharvit-kesem commented 3 years ago

As Im moving along, I have two questions Im trying to understand, and it would be much appreciated if someone can clarify a bit: First, I noticed when the cash is over, many transactions wont be filled. For example, if we are low on cash and training, many times the actions wont be filled fully, isnt that a problem? As I understand the agent could get from it that the actions were completed but actually the reward is not based on his actions, am I right?

Second thing, Im trying to understand the scaling part when creating the state. For example in the env_stocktrading_np.py env, on the get state method, there is a scale on the price, stocks & the cooldown (2 ** -6). What is the meaning of that number, why 1/64?

rayrui312 commented 3 years ago

It depends on the environment settings. Probably It is because we set the minimum number of shares for each transaction. The cash is not enough to meet this requirement, so the actions are not filled. As we use cash as one of the inputs, the agent can learn that 'under low cash, actions may not be filled.' So what you said is correct but not a problem.
This is the feature scaling method used to normalize the range of features of data. You may search it on Google for further explanations.

sharvit-kesem commented 3 years ago

But how does the agent knows that the actions are not filled? I saw in I think the env_stocktrading_np, there was the stocks_cd that is in the input, and if the order wasn't filled then it was returned in the state. But in other envs, how does the agent know if the orders were filled?

rayrui312 commented 3 years ago

The cash and stocks don't change. And the reward is 0. This tells the agent that the actions are not filled under low cash.

sharvit-kesem commented 3 years ago

But what happens if it was only partly fulfilled? Then stocks & cash change, also reward in both cases(full & partial) can change either way if there is a liquidity block... Im saying this because Im seeing it especially if you run it on a bull market that has been for years(Nasdaq past 5 years), the env just run out of money half way and there is no way for the agent to give priority to assests, because he has no resources. Is what Im saying wrong?

rayrui312 commented 3 years ago

We assume that there is no liquidity block, and the orders can always be filled with the close price. If the cash is not enough, we don't submit the order instead of partially fulfilling the order (so there is no transaction at all in this case).
We observe that the agent runs out of money halfway and stops turnover in a simple stock trading env. To solve this, you may try cash penalty env, take profit env, or set periodical selling in the env.

sharvit-kesem commented 3 years ago

As for cash penalty & periodical, Im looking into it. As for the other part, what I meant about partial fulfillment, is more like some sells but not all buys... So the stocks amount & reward will change. But I guess the penalty could lead the agent to explore a different road. I saw there is a stocks_cd in the state, looks like it was designed to handle that problem. Not sure if you wrote it, if so, is it related?

rayrui312 commented 3 years ago

Yes, you're right. In that case, the agent will not know it.
Cash penalty is used to encourage the agent to reserve some cash. Stocks_cd (stocks_cool_down) is designed to control the trading frequency.

sharvit-kesem commented 3 years ago

Alright, thank you for the inputs. Much appreciated! Again great work & your support is really something