AI4Finance-Foundation / FinRL-Tutorials

Tutorials. Please star.
https://ai4finance.org
MIT License
853 stars 349 forks source link

Stock_NeurIPS2018_2_Train.ipynb: clarification on state space, action space #92

Open ra9hur opened 3 months ago

ra9hur commented 3 months ago

Have couple of questions, should be trivial, but somehow not getting.

  1. Action Space As mentioned in the description, for a single share, action space should just be [buy, sell, hold] or {-1, 0, 1}. For multiple shares say 10, action space = {-10 ... -1, 0, 1 ... 10} which should be equal to, action_space = 2 * stock_dimension + 1

However, referring to env_kwargs in the workbook, "action_space": stock_dimension is being considered. Can you please clarify ?

  1. State Space Also, can you help as to how you arrived at state_space ? state_space = 1 + 2stock_dimension + len(INDICATORS)stock_dimension

I could understand state variables corresponding to - len(INDICATORS)stock_dimension. Why (1 + 2stock_dimension) is being added ?

ZiyiXia commented 3 months ago
  1. stock_dimension is being considered in the action space because we need the agent to make a decision in [buy, sell, hold] with certain amount for each ticker (30 for the Dow Johns Index).
  2. state_space = 1 (remaining balance in the account) + 2stock_dimension (prices of 30 stocks and the share holdings of the 30 stocks, so totally 2stock_dimension) + len(INDICATORS)*stock_dimension

I'm not sure if I understand your question correctly, just lmk

ra9hur commented 3 months ago

Thanks for the response !! understand state_space now.

ra9hur commented 3 months ago

Here is the description provided for action space in the notebook.

Action: The action space describes the allowed actions that the agent interacts with the environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

Going by the above description, If we are trading only 1 stock, possible number of actions is : [buy, sell, hold] action_space = 1 (for hold) + 2 * stock_dimension (buys, sells for 1 stock) = 3

if there are 30 stocks, possible number of actions should be : 30 sells, 30 buys and a hold. So the formula for action space as I understand should be, action_space = 1 (for hold) + 2 * stock_dimension (buys, sells for 30 stocks) = 61


However, in the notebook, action space is considered equal to stock dimension. So, for 30 stocks, action_space = stock_dimension = 30

Can you please clarify, why "action_space = stock_dimension" is considered ?

ZiyiXia commented 3 months ago

Because for each stock, the action is a scalar with in a continuous space instead of a discrete space like {-1, 0, 1}. Or say we just need one action for each stock, and that action is from a continuous space. Thus the action is always a vector with dimension of 30. The amount of each element directly represents buy(+)/hold(0)/sell(-) for that stock.

ra9hur commented 3 months ago

Excellent !! Thanks for clarifying !!

ven7782 commented 3 months ago

Thanks for opening up this issue. I will use this thread rather than create a new one. I am studying the notebook Stock_NeurIPS2018.ipynb and experimenting. I notice several issues in it. I will list them here in no particular order.

1.) I am using one stock to train (Say AAPL). Here I notice that the DDPG agent is not learning at all. I notice that the action space used to call the step function converges quickly to -1 (All sell) or 1 (All buy) and the reward calculated is 0 which probably explains the convergence to one specific action. Has anyone observed this?

2.) I see that the agent only buys and sells and does not hold at all during learning. Negative action values are sell and positive is buy. Shouldn't the action space be divided equally between sell, hold and buy Sell: [-1, -0.33] Hold: (-0.33, 0.33) Buy: (0.33, 1]

3.) If the initial action is sell and the number of shares at the start will naturally be 0. In this case the agent is barred from selling and sell_num_shares: 0 until some buy actions are generated. Later sell action goes through because we have shares to sell. I feel this is too restrictive during the initial learning process. The agent should be allowed to sell or buy provided we have the funds.

Any comments or suggestions will be appreciated.