Closed sharvit-kesem closed 2 years ago
It depends on the environment settings. Probably It is because we set the minimum number of shares for each transaction. The cash is not enough to meet this requirement, so the actions are not filled. As we use cash as one of the inputs, the agent can learn that 'under low cash, actions may not be filled.' So what you said is correct but not a problem.
This is the feature scaling method used to normalize the range of features of data. You may search it on Google for further explanations.
But how does the agent knows that the actions are not filled? I saw in I think the env_stocktrading_np, there was the stocks_cd that is in the input, and if the order wasn't filled then it was returned in the state. But in other envs, how does the agent know if the orders were filled?
The cash and stocks don't change. And the reward is 0. This tells the agent that the actions are not filled under low cash.
But what happens if it was only partly fulfilled? Then stocks & cash change, also reward in both cases(full & partial) can change either way if there is a liquidity block... Im saying this because Im seeing it especially if you run it on a bull market that has been for years(Nasdaq past 5 years), the env just run out of money half way and there is no way for the agent to give priority to assests, because he has no resources. Is what Im saying wrong?
As for cash penalty & periodical, Im looking into it. As for the other part, what I meant about partial fulfillment, is more like some sells but not all buys... So the stocks amount & reward will change. But I guess the penalty could lead the agent to explore a different road. I saw there is a stocks_cd in the state, looks like it was designed to handle that problem. Not sure if you wrote it, if so, is it related?
Alright, thank you for the inputs. Much appreciated! Again great work & your support is really something
As Im moving along, I have two questions Im trying to understand, and it would be much appreciated if someone can clarify a bit: First, I noticed when the cash is over, many transactions wont be filled. For example, if we are low on cash and training, many times the actions wont be filled fully, isnt that a problem? As I understand the agent could get from it that the actions were completed but actually the reward is not based on his actions, am I right?
Second thing, Im trying to understand the scaling part when creating the state. For example in the env_stocktrading_np.py env, on the get state method, there is a scale on the price, stocks & the cooldown (2 ** -6). What is the meaning of that number, why 1/64?