Open ra9hur opened 3 months ago
I'm not sure if I understand your question correctly, just lmk
Thanks for the response !! understand state_space now.
Here is the description provided for action space in the notebook.
Action: The action space describes the allowed actions that the agent interacts with the environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively
Going by the above description, If we are trading only 1 stock, possible number of actions is : [buy, sell, hold] action_space = 1 (for hold) + 2 * stock_dimension (buys, sells for 1 stock) = 3
if there are 30 stocks, possible number of actions should be : 30 sells, 30 buys and a hold. So the formula for action space as I understand should be, action_space = 1 (for hold) + 2 * stock_dimension (buys, sells for 30 stocks) = 61
However, in the notebook, action space is considered equal to stock dimension. So, for 30 stocks, action_space = stock_dimension = 30
Can you please clarify, why "action_space = stock_dimension" is considered ?
Because for each stock, the action is a scalar with in a continuous space instead of a discrete space like {-1, 0, 1}. Or say we just need one action for each stock, and that action is from a continuous space. Thus the action is always a vector with dimension of 30. The amount of each element directly represents buy(+)/hold(0)/sell(-) for that stock.
Excellent !! Thanks for clarifying !!
Thanks for opening up this issue. I will use this thread rather than create a new one. I am studying the notebook Stock_NeurIPS2018.ipynb and experimenting. I notice several issues in it. I will list them here in no particular order.
1.) I am using one stock to train (Say AAPL). Here I notice that the DDPG agent is not learning at all. I notice that the action space used to call the step function converges quickly to -1 (All sell) or 1 (All buy) and the reward calculated is 0 which probably explains the convergence to one specific action. Has anyone observed this?
2.) I see that the agent only buys and sells and does not hold at all during learning. Negative action values are sell and positive is buy. Shouldn't the action space be divided equally between sell, hold and buy Sell: [-1, -0.33] Hold: (-0.33, 0.33) Buy: (0.33, 1]
3.) If the initial action is sell and the number of shares at the start will naturally be 0. In this case the agent is barred from selling and sell_num_shares: 0 until some buy actions are generated. Later sell action goes through because we have shares to sell. I feel this is too restrictive during the initial learning process. The agent should be allowed to sell or buy provided we have the funds.
Any comments or suggestions will be appreciated.
Have couple of questions, should be trivial, but somehow not getting.
However, referring to env_kwargs in the workbook, "action_space": stock_dimension is being considered. Can you please clarify ?
I could understand state variables corresponding to - len(INDICATORS)stock_dimension. Why (1 + 2stock_dimension) is being added ?