AI4Finance-Foundation / FinRL-Meta

FinRL­-Meta: Dynamic datasets and market environments for FinRL.
https://ai4finance.org
MIT License
1.18k stars 555 forks source link

[DEBUGGING HELP] ValueError: could not broadcast input array from shape (14,6) into shape (22,14) #234

Open Daiiszuki opened 1 year ago

Daiiszuki commented 1 year ago

In refernce to https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb

The train data is shaped as (7812, 19)

Passing the data to the env runs without any errors

cryptoEnv = cryptoPortfolioAllocationEnvironment(dataFrame=trainData, **envKwargs)

And when I call cryptoEnv.observation_space, the shape is (22, 14), which assume is a combination of the price and indicators:

14 tickers, 8 indicators

running activeEnv, _ = cryptoEnv.stableBaselineEnv()

returns


ValueError                                Traceback (most recent call last)
[<ipython-input-68-f822f4852cfe>](https://localhost:8080/#) in <module>
----> 1 activeEnv, _ = cryptoEnv.stableBaselineEnv()

2 frames
[<ipython-input-63-fd656b920b2f>](https://localhost:8080/#) in stableBaselineEnv(self)
    189     def stableBaselineEnv(self):
    190       sb = DummyVecEnv([lambda: self])
--> 191       obs = sb.reset()
    192       return sb, obs
    193 

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/dummy_vec_env.py](https://localhost:8080/#) in reset(self)
     62         for env_idx in range(self.num_envs):
     63             obs = self.envs[env_idx].reset()
---> 64             self._save_obs(env_idx, obs)
     65         return self._obs_from_buf()
     66 

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/dummy_vec_env.py](https://localhost:8080/#) in _save_obs(self, env_idx, obs)
     92         for key in self.keys:
     93             if key is None:
---> 94                 self.buf_obs[key][env_idx] = obs
     95             else:
     96                 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (14,6) into shape (22,14)

What am I missing ?

Please let me know if you require any additional info.The function to generate the env is as follows

def stableBaselineEnv(self):
      sb = DummyVecEnv([lambda: self])
      obs = sb.reset()
      return sb, obs
Daiiszuki commented 1 year ago

Also, may please provide a link to the finrl discord , telegram or similar if it exists

Daiiszuki commented 1 year ago

In relation to this line from the FinRL_PortfolioAllocation_NeurIPS_2020.ipynb tutorial:

self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)

using my implementation, for example, the shape ofnp.array(self.covs)will be 6,6

[[0.0031702  0.00174953 0.00141159 0.00196729 0.00234947 0.00219413]
 [0.00174953 0.00200029 0.0007653  0.00131069 0.0015602  0.00137297]
 [0.00141159 0.0007653  0.00102168 0.00109734 0.00125215 0.00125944]
 [0.00196729 0.00131069 0.00109734 0.00271591 0.00192142 0.00169665]
 [0.00234947 0.0015602  0.00125215 0.00192142 0.00240365 0.00197903]
 [0.00219413 0.00137297 0.00125944 0.00169665 0.00197903 0.00207293]]
[25]

While[self.data[tech].values.tolist() for tech in self.tech_indicator_list ] returns (7812,)

YangletLiu commented 1 year ago

@Daiiszuki Thanks for your detailed report! We are looking into it.

Athe-kunal commented 1 year ago

@Daiiszuki Here is the discord link

https://discord.gg/r4BRPJgt

Daiiszuki commented 1 year ago

@Athe-kunal Thank you, I appreciate it

Daiiszuki commented 1 year ago

@XiaoYangLiu-FinRL

  1. For debugging purposes, I defined the obsSpace as self.observation_space = spaces.Box( low=np.inf, high = np.inf, shape = ( self.state_space,6)), question is, why does dummy_vec_env request for (14,6)? I'm very new to stablebaseline so I may be misissing something on that front
  2. With that shape, I get AttributeError: 'crytoEnv' has no variable 'reward' (because the reward variable was not initialised?).
  3. Initialising it to 0 produces the following output:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Daiiszuki commented 1 year ago

Also, may you please explain the sharpe ratio calculation

if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)

specifically this syntax/ \

If 252 is the number of trading days, would 356 be more appropriate appropriate for crypto?

zhumingpassional commented 1 year ago

I tested the notebook and it works https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb.

I recommend you to paste all the code here.

Daiiszuki commented 1 year ago

My environment:


class cryptoPortfolioAllocationEnvironment(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self, 
               dataFrame,
                cryptoDimension,
                maxCryptos,
                startingCapital,
                transactionFeePercentage,
                agentRewardFactor,
                state_space,
                action_space,
                indicatorList,
                turbulance=None,
                lookbackPeriod=252,
                dateIncriment = 0):

      self.dateIncriment = dateIncriment
      self.lookbackPeriod = lookbackPeriod
      self.dataFrame = dataFrame
      self.cryptoDimension = cryptoDimension
      self.maxCryptos = maxCryptos
      self.startingCapital = startingCapital
      self.transactionFeePercentage = transactionFeePercentage
      self.agentRewardFactor = agentRewardFactor
      self.state_space = state_space
      self.action_space = action_space
      self.indicatorList = indicatorList
      self.turbulance=None

      self.action_space = spaces.Box(low = 0, high = 1, shape=(self.action_space,))

      self.observation_space = spaces.Box( low=np.inf, high = np.inf, shape = ( self.state_space + len(indicatorList), self.state_space))

      self.frame = self.dataFrame.loc[[self.dateIncriment]]
      self.covarianceList = self.frame['covarianceList'].values[0]
      self.state = np.append(
                                  self.covarianceList,
                                           [self.frame[indic].values.tolist()  for indic in  self.indicatorList],
                                          axis=0    
                                  )
      self.term  =False
      self.turbulance=turbulance

      self.portfolioValue = self.startingCapital

      self.portfolioHistory = [self.startingCapital]

      self.portFolioReturnHistory = [0]
      self.actionHistory = [[[1/cryptoDimension] * cryptoDimension]] 
      self.dateHistory = [self.frame.date.unique()[0]]

    def normaliseSoftmax(self, actions):

      expNumerator = np.exp(actions)
      expDenominator = np.sum(np.exp(actions))
      output =   expNumerator/ expDenominator
      return output

    def step(self, actions):

      self.term = self.dateIncriment >= len(self.frame.index.unique())-1

      print(self.dateIncriment)
      if self.term:
        dataFrame = panda.DataFrame(self.portFolioReturnHistory)

        dataFrame.columns = ['listOfReturns']
        plt.plot(dataFrame.listOfReturns.cumsum(),'r')
        if not os.path.exists("./TRAINED_MODEL_OUTPUT"):

           os.makedirs("./" + 'TRAINED_MODEL_OUTPUT"')

        plt.savefig('./TRAINED_MODEL_OUTPUT"')
        plt.close()

        plt.plot(self.portFolioReturnHistory,'r')
        plt.savefig('TRAINED_MODEL_OUTPUT/reward.png')
        plt.close()

        print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
        print(f'INTITIAL PORTFOLIO VALUE:{self.portfolioHistory[0]}')
        print(f'END PORTFOLIO VALUE:{self.portfolioValue}')

        returnDf = panda.DataFrame(self.portFolioReturnHistory)
        returnDf.columns = ['listOfReturns']
        if returnDf['listOfReturns'].std() !=0:
          sharpePrint = (252*0.5)*returnDf['listOfReturns'].mean()/ \
              returnDf['listOfReturns'].std()
          print("SHARPE", sharpePrint)
        print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

        return self.state, self.reward, self.term, {}

      else:

        portfolioWeight = self.normaliseSoftmax(actions)

        self.actionHistory.append(portfolioWeight)
        recentHistory = self.dataFrame

        #Increase frame by 1 
        self.dateIncriment +=1
        self.frame = self.dataFrame.loc[[self.dateIncriment]]
        self.covarianceList = self.frame['covarianceList'].values[0]
        self.state = np.append(np.array(self.covarianceList), [self.frame[indicator].values.tolist()  for indicator in self.indicatorList], axis=0)

        portfolioReturn = sum(
            ((self.frame.close.values / recentHistory.close.values)-1)*portfolioWeight

        )

        updatePortfolioVal = self.portfolioValue *(1+portfolioReturn)

        self.portfolioValue = updatePortfolioVal

        self.portFolioReturnHistory.append(portfolioReturn)
        self.dateHistory.append(self.frame.date.unique()[0])            
        self.portfolioHistory.append(updatePortfolioVal)

        self.reward = updatePortfolioVal
      return  self.state, self.reward, self.term, {}

    def reset(self):
        self.portfolioHistory = [self.startingCapital]
        self.dateIncriment = 0 
        self.frame = self.dataFrame.loc[[self.dateIncriment]]
        self.covarianceList = self.frame['covarianceList'].values[0]
        self.state = np.append(np.array(self.covarianceList), [self.frame[indicator].values.tolist()  for indicator in self.indicatorList], axis=0)

        self.portFolioReturnHistory = [0]
        self.term  =False
        self.portfolioValue = self.startingCapital
        self.actionHistory = [[[1/cryptoDimension] * cryptoDimension]] 
        self.dateHistory = [self.frame.date.unique()[0]]

        return self.state

    def renderMode(self, mode='human'):
        return self.state

    def savePortfolioHist(self):
      dateSave = self.dateHistory
      returnSave = self.portFolioReturnHistory
      returnFrame = panda.Dataframe({'date':dateSave, 'return': returnSave})

      return returnFrame

    def saveActionHistory(self):
        dates = self.dateHistory
        frameDate = panda.DataFrame(dates)
        frameDate.columns = ['date']

        actions = self.actionHistory
        frameActions = panda.DataFrame(actions)
        frameActions.columns  = self.frame.tic.values
        frameActions.index = frameDate.date

        return frameActions                                                                                

     #Nesscessary?

    def render(self, seedFactor=None):
        self.randSeed, seedFactor = seeding.np_random(seedFactor)
        return [seedFactor]

    def stableBaselineEnv(self):
      sb = DummyVecEnv([lambda: self])
      obs = sb.reset()
      return sb, obs
zhumingpassional commented 1 year ago

could you tell me where the test code for this env is?

Daiiszuki commented 1 year ago

No issue here


INDICATOR_LIST =  ["boll_ub",
                        "boll_lb",
                        'rsi',
                        'cci',
                        'macd',
                        'dx',
                        "close_20_sma",
                        "close_60_sma",

                             ]  

cryptoDimension = len(pickleFrame.tic.unique())
state_space =cryptoDimension

print(f'CRYPTO DIMENSION: {cryptoDimension}, STATE SPACE: {state_space}')

envKwargs = {
    "maxCryptos": 100, 
    "startingCapital": 1000000, 
    "transactionFeePercentage": 0.001, 
    "state_space": state_space, 
    "cryptoDimension": cryptoDimension, 
    "indicatorList": INDICATOR_LIST, 
    "action_space": cryptoDimension, 
    "agentRewardFactor": 1e-4

}

cryptoEnv = cryptoPortfolioAllocationEnvironment(dataFrame=trainData, **envKwargs)

Running activeEnv, _ = cryptoEnv.stableBaselineEnv() causes the error

Daiiszuki commented 1 year ago

Why/how is it that dummyvenEnv expects (14,6)?

zhumingpassional commented 1 year ago

I am wondering if you can send all the code to me. some variables, e.g., pickleFrame is not defined. It does not work.

Daiiszuki commented 1 year ago

The rest of the code is to with the data acquisition (In a different notebook) using the following params:

#Set constants
LIST_OF_SYMBOLS = ['ADAUSDT' ,'ATOMUSDT' ,'BNBUSDT', 'BTCUSDT' ,'DOTUSDT' ,'ETCUSDT', 'ETHUSDT','LINKUSDT', 'LTCUSDT' ,'SOLUSDT' ,'XMRUSDT' ,'BCHUSDT', 'MATICUSDT', 'DAIUSDT']

#Set time interval
TIME_INTERVAL = '1d'

#Training start
START_TRAIN =  '2018-01-01'

#Training end
END_TRAIN = '2020-12-01'

#Trading start
START_TRADE = '2020-12-01'

#Trading end
END_TRADE = '2022-06-01'

#List of technical indicators
TECHNICAL_INDICATORS = ["boll_ub",
                        "boll_lb",
                        'rsi',
                        'cci',
                        'macd',
                        'dx',
                        "close_20_sma",
                        "close_60_sma",

                             ]  

if_vix = False

processorObj = DataProcessor(data_source = 'binance', start_date= START_TRAIN, end_date =END_TRAIN, time_interval=TIME_INTERVAL   ) 
processorObj.download_data(LIST_OF_SYMBOLS)
processorObj.clean_data()
processorObj.add_technical_indicator(TECHNICAL_INDICATORS)
frame = processorObj.dataframe

pickleframe is``` pickleFrame = panda.read_pickle(r'/content/drive////cleanData.pkl') pickleFrame.columns

Index(['index', 'tic', 'time', 'open', 'high', 'low', 'close', 'adjusted_close', 'volume', 'boll_ub', 'boll_lb', 'rsi', 'cci', 'macd', 'dx', 'close_20_sma', 'close_60_sma', 'covarianceList', 'listOfReturns'], dtype='object')



The link to the data sourcing notebook https://colab.research.google.com/drive/1x4wwYruxjCoF-AVAR6sM_hXgCRf8lctM?usp=sharing

Training:
https://colab.research.google.com/drive/1x4wwYruxjCoF-AVAR6sM_hXgCRf8lctM?usp=sharing
Daiiszuki commented 1 year ago

Perhaps I could just move onto a different implementation? the matter has become urgent

zhumingpassional commented 1 year ago

sorry delayed response. I am busy these days. I think you should check the data again. Make sure that the shape of data and RL's state coincide.

Daiiszuki commented 1 year ago

Please elaborate on the shape. What would be the required shape in this case for example, and how/why is it determined by the dummyvecenv/ as (14,6)?

Daiiszuki commented 1 year ago

Chaning the ticker list toa length of 12 and the indicator list to 6 seems to solve the issue. But as it was before, no learning is happening

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
Daiiszuki commented 1 year ago

cryptoEnv.state.shape returns (12, 6) cryptoEnv.state_space returns 12 cryptoEnv.observation_space returns (12, 6)

zhumingpassional commented 1 year ago

does the data have nan? If yes, you should fill nan with values

Daiiszuki commented 1 year ago

No nan values, as in the notebook

Daiiszuki commented 1 year ago

It looks like an issue to do with the reward returned from the step function. The variable was not initialised outside the class. My quick workaround was to define the reward with the other class vars

What do you recommend?

Daiiszuki commented 1 year ago

Also, I think nan values would cause an error, no?

Screen Shot 2022-09-21 at 11 26 25 AM
Daiiszuki commented 1 year ago

I was asking, last time, about this sharpe calculation and /\syntax

Daiiszuki commented 1 year ago
def step(self, actions):

      self.term = self.dateIncriment >= len(self.frame.index.unique())-1

      if self.term:
        dataFrame = panda.DataFrame(self.portFolioReturnHistory)

        dataFrame.columns = ['listOfReturns']
        plt.plot(dataFrame.listOfReturns.cumsum(),'r')
        if not os.path.exists("./TRAINED_MODEL_OUTPUT"):

           os.makedirs("./" + 'TRAINED_MODEL_OUTPUT"')

        plt.savefig('./TRAINED_MODEL_OUTPUT"')
        plt.close()

        plt.plot(self.portFolioReturnHistory,'r')
        plt.savefig('TRAINED_MODEL_OUTPUT/reward.png')
        plt.close()

        print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
        print(f'INTITIAL PORTFOLIO VALUE:{self.portfolioHistory[0]}')
        print(f'END PORTFOLIO VALUE:{self.portfolioValue}')

        returnDf = panda.DataFrame(self.portFolioReturnHistory)
        returnDf.columns = ['listOfReturns']
        if returnDf['listOfReturns'].std() !=0:
          sharpePrint = (252*0.5)*returnDf['listOfReturns'].mean()/ \
              returnDf['listOfReturns'].std()
          print("SHARPE", sharpePrint)
        print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

        return self.state, self.reward, self.term, {}

      else:

        portfolioWeight = self.normaliseSoftmax(actions)

        self.actionHistory.append(portfolioWeight)
        recentHistory = self.dataFrame

        #Increase frame by 1 
        self.dateIncriment +=1
        self.frame = self.dataFrame.loc[[self.dateIncriment]]
        self.covarianceList = self.frame['covarianceList'].values[0]
        self.state = np.append(np.array(self.covarianceList), [self.frame[indicator].values.tolist()  for indicator in self.indicatorList], axis=0)

        portfolioReturn = sum(
            ((self.frame.close.values / recentHistory.close.values)-1)*portfolioWeight

        )

        updatePortfolioVal = self.portfolioValue *(1+portfolioReturn)

        self.portfolioValue = updatePortfolioVal

        self.portFolioReturnHistory.append(portfolioReturn)
        self.dateHistory.append(self.frame.date.unique()[0])            
        self.portfolioHistory.append(updatePortfolioVal)

        self.reward = updatePortfolioVal
      return  self.state, self.reward, self.term, {}
Daiiszuki commented 1 year ago

Also, Attempting to run cryptoEnv.saveActionHistory() returns


ValueError                                Traceback (most recent call last)
[<ipython-input-48-2462bb954d50>](https://localhost:8080/#) in <module>
----> 1 cryptoEnv.saveActionHistory()

5 frames
[<ipython-input-38-83dc1ec2cd31>](https://localhost:8080/#) in saveActionHistory(self)
    179         actions = self.actionHistory
    180         frameActions = panda.DataFrame(actions)
--> 181         frameActions.columns  = self.frame.tic.values
    182         frameActions.index = frameDate.date
    183 

[/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in __setattr__(self, name, value)
   5498         try:
   5499             object.__getattribute__(self, name)
-> 5500             return object.__setattr__(self, name, value)
   5501         except AttributeError:
   5502             pass

/usr/local/lib/python3.7/dist-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

[/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in _set_axis(self, axis, labels)
    764     def _set_axis(self, axis: int, labels: Index) -> None:
    765         labels = ensure_index(labels)
--> 766         self._mgr.set_axis(axis, labels)
    767         self._clear_item_cache()
    768 

[/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py](https://localhost:8080/#) in set_axis(self, axis, new_labels)
    214     def set_axis(self, axis: int, new_labels: Index) -> None:
    215         # Caller is responsible for ensuring we have an Index object.
--> 216         self._validate_set_axis(axis, new_labels)
    217         self.axes[axis] = new_labels
    218 

[/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py](https://localhost:8080/#) in _validate_set_axis(self, axis, new_labels)
     56         elif new_len != old_len:
     57             raise ValueError(
---> 58                 f"Length mismatch: Expected axis has {old_len} elements, new "
     59                 f"values have {new_len} elements"
     60             )

ValueError: Length mismatch: Expected axis has 1 elements, new values have 6 elements
Daiiszuki commented 1 year ago

    def saveActionHistory(self):
        dates = self.dateHistory
        frameDate = panda.DataFrame(dates)
        frameDate.columns = ['date']

        actions = self.actionHistory
        frameActions = panda.DataFrame(actions)
         frameActions.columns  = self.frame.tic.values
        frameActions.index = frameDate.date

        return frameActions    
zhumingpassional commented 1 year ago

the step function should obey the principles of openai gym. pls read the env of finrl or openai. your env does not work.

Daiiszuki commented 1 year ago

@zhumingpassional this step function was actually inspired by the function was in the introductory notebook:

 def step(self, actions):
        # print(self.day)
        self.terminal = self.day >= len(self.df.index.unique())-1
        # print(actions)

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(),'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()

            plt.plot(self.portfolio_return_memory,'r')
            plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))           
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)
            print("=================================")

            return self.state, self.reward, self.terminal,{}

        else:
            #print("Model actions: ",actions)
            # actions are the portfolio weight
            # normalize to sum of 1
            #if (np.array(actions) - np.array(actions).min()).sum() != 0:
            #  norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
            #else:
            #  norm_actions = actions
            weights = self.softmax_normalization(actions) 
            #print("Normalized actions: ", weights)
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            #print(self.state)
            # calcualte portfolio return
            # individual stocks' return * weight
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])            
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value 
            #print("Step reward: ", self.reward)
            #self.reward = self.reward*self.reward_scaling

        return self.state, self.reward, self.terminal, {}

I notticed that you have closed the issue, but please, I would appreciate if you could at least point me in the right direction

Daiiszuki commented 1 year ago

In the notebook, the reward variable is defined in the step function, hence, I'm getting 'cryptoenv object has no attribute 'reward'

Daiiszuki commented 1 year ago

I noticed that then performing the covar matrix calculation, the resulting matrix was 6x6 even though I had 10-12 tickers. This happened because the data acquisition did not start at the same day. Eg btc ltc data from 2018 was available, but some cryptos such as DOt only had observations starting after the period. Picking only the full range of available tickers worked, But now I get


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-62-3b90e9477704>](https://localhost:8080/#) in <module>
      1 traderAgent.train_model(model=PPOModel, 
      2                              tb_log_name='ppo',
----> 3                              total_timesteps=500)

7 frames
[/usr/local/lib/python3.7/dist-packages/finrl/agents/stablebaselines3/models.py](https://localhost:8080/#) in train_model(self, model, tb_log_name, total_timesteps)
    104             total_timesteps=total_timesteps,
    105             tb_log_name=tb_log_name,
--> 106             callback=TensorboardCallback(),
    107         )
    108         return model

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/ppo/ppo.py](https://localhost:8080/#) in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    317             tb_log_name=tb_log_name,
    318             eval_log_path=eval_log_path,
--> 319             reset_num_timesteps=reset_num_timesteps,
    320         )

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/on_policy_algorithm.py](https://localhost:8080/#) in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    245         while self.num_timesteps < total_timesteps:
    246 
--> 247             continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
    248 
    249             if continue_training is False:

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/on_policy_algorithm.py](https://localhost:8080/#) in collect_rollouts(self, env, callback, rollout_buffer, n_rollout_steps)
    173                 clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)
    174 
--> 175             new_obs, rewards, dones, infos = env.step(clipped_actions)
    176 
    177             self.num_timesteps += env.num_envs

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py](https://localhost:8080/#) in step(self, actions)
    160         """
    161         self.step_async(actions)
--> 162         return self.step_wait()
    163 
    164     def get_images(self) -> Sequence[np.ndarray]:

[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/dummy_vec_env.py](https://localhost:8080/#) in step_wait(self)
     42         for env_idx in range(self.num_envs):
     43             obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
---> 44                 self.actions[env_idx]
     45             )
     46             if self.buf_dones[env_idx]:

[<ipython-input-56-3436960a628c>](https://localhost:8080/#) in step(self, actions)
    157            self.data = self.df.loc[[self.day]]
    158            self.covs = self.data['covarianceList'].values[0]
--> 159            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
    160            #print(self.state)
    161            # calcualte portfolio return

<__array_function__ internals> in append(*args, **kwargs)

[/usr/local/lib/python3.7/dist-packages/numpy/lib/function_base.py](https://localhost:8080/#) in append(arr, values, axis)
   4815         values = ravel(values)
   4816         axis = arr.ndim-1
-> 4817     return concatenate((arr, values), axis=axis)
   4818 
   4819 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 7 and the array at index 1 has size 6

when I run


traderAgent.train_model(model=PPOModel, 
                             tb_log_name='ppo',
                             total_timesteps=500)

Even though

testfin = StockPortfolioEnv(df=trainData, **envKwargs)
testfin.covs

returns

array([[0.00322093, 0.00197096, 0.00160227, 0.00266498, 0.00238231,
        0.00275962],
       [0.00197096, 0.00234589, 0.00115438, 0.0018874 , 0.00164733,
        0.00169181],
       [0.00160227, 0.00115438, 0.0011635 , 0.00151229, 0.00143134,
        0.00148325],
       [0.00266498, 0.0018874 , 0.00151229, 0.00279166, 0.00225945,
        0.00224763],
       [0.00238231, 0.00164733, 0.00143134, 0.00225945, 0.00226624,
        0.0020566 ],
       [0.00275962, 0.00169181, 0.00148325, 0.00224763, 0.0020566 ,
        0.00313031]])
[60]
0s
Daiiszuki commented 1 year ago

When the last line is run again after calling the train_model, the ouput generated is


array([[nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan]])

Which is 7x7

Question is, why?

Daiiszuki commented 1 year ago

Side. note: The above issue was i use StockTradinEnv.. The shape of the COVS was being changed

Using my env however, the training starts when I initialise reward to 0 , with nan sharpe output and no change in portfolio values.I tried to let it train for some time and got

valueerror: expected parameter loc (tensor of shape (128, 6)) of distribution normal(loc: torch.size([128, 6]), scale: torch.size([128, 6])) to satisfy the constraint real(), but found invalid values: tensor([[nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan]],

Daiiszuki commented 1 year ago

As in https://github.com/AI4Finance-Foundation/FinRL/issues/401#issue-1084845463 and https://github.com/AI4Finance-Foundation/FinRL/issues/696#issue-1338356103

Daiiszuki commented 1 year ago

How would I return an observation in the reset function such that it aligns with the _last_obs dict of dummyvecenv

zhumingpassional commented 1 year ago

@Daiiszuki You should check the env including step() and reset(), and make sure the shape of state is always correct at each step, and the state changes to the correct state under an action. I suggest you to write a very simple env and make sure it works.

If they are correct, the RL algorithm will obtain a policy, and we can discuss the next.

Daiiszuki commented 1 year ago

Thanks, that's kinda helpful, but still a bit vague

How would a "simple env" look like in this context ?

What issues can you see in my step and reset? Which is based on the the portfolio allocation notebook

And I tried printing self.state at each step :

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(12, 6)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(12, 6)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(12, 6)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan

A sample state looks like this


[[ 3.22092570e-03  1.97095890e-03  1.60226609e-03  2.66498489e-03
   2.38231480e-03  2.75962130e-03]
 [ 1.97095890e-03  2.34588985e-03  1.15437913e-03  1.88740120e-03
   1.64732788e-03  1.69181107e-03]
 [ 1.60226609e-03  1.15437913e-03  1.16349898e-03  1.51228729e-03
   1.43133681e-03  1.48325483e-03]
 [ 2.66498489e-03  1.88740120e-03  1.51228729e-03  2.79165952e-03
   2.25944713e-03  2.24762622e-03]
 [ 2.38231480e-03  1.64732788e-03  1.43133681e-03  2.25944713e-03
   2.26624196e-03  2.05660397e-03]
 [ 2.75962130e-03  1.69181107e-03  1.48325483e-03  2.24762622e-03
   2.05660397e-03  3.13031382e-03]
 [ 4.82398738e-02  6.43640807e+00  4.12783361e+03  1.62888621e+02
   3.42617669e+01  1.29988871e-01]
 [ 3.10171262e-02  4.88481193e+00  3.38948439e+03  8.50473792e+01
   2.69032331e+01  1.00539129e-01]
 [ 5.50679421e+01  5.37318170e+01  4.88418440e+01  6.30237320e+01
   5.25227431e+01  4.34952341e+01]
 [ 9.90493210e-04  9.50464597e-02 -4.66502410e+01  8.93967559e+00
   2.41152967e-01 -6.74114990e-03]
 [ 3.96285000e-02  5.66061000e+00  3.75865900e+03  1.23968000e+02
   3.05825000e+01  1.15264000e-01]
 [ 4.53968333e-02  6.21240667e+00  4.34476267e+03  1.33744167e+02
   3.41323333e+01  1.60224833e-01]]
Daiiszuki commented 1 year ago

After weeks of testing different solutions, I Finally got the problem sorted.

Before I assumed that it was something to do with the DataPreprocessor so I tried using the Yahhoo preprocessor to download the data. (NOT)

Then tried using my env in the tutorial notebook and everything checked out, the model training occurs. so I concluded that it wasn't the issues

As noted earlier, in my project, the model training and data acquisition are done in seperate notebooks

pd.to_csv()was used to export the data and pd.read_csv(df,index_col =False )

so it seemed as though the issue is as a result of the one of the two.

The shape of the imported df was verified, but when passed to the env, a different shape is produced by df.loc[self.day,:]

What could some potential causes/solutions?

@XiaoYangLiu-FinRL

Daiiszuki commented 1 year ago

Right,.

closing this as it's clearly not a finRL issue.

Cheers @XiaoYangLiu-FinRL. valuable learning experience.

Opensourceisthefuture lol

YangletLiu commented 1 year ago

@Daiiszuki Sure. Hope you enjoy!

Daiiszuki commented 1 year ago

@XiaoYangLiu-FinRL I think this is the part where you offer me a job lol

Daiiszuki commented 1 year ago

Overnight


---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-29-a96b7571d55f>](https://localhost:8080/#) in <module>
     41                              ]  
     42 
---> 43 processorObj = DataProcessor(data_source = 'binance', start_date= START_TRAIN, end_date =END_TRADE, time_interval=TIME_INTERVAL   )
     44 processorObj.download_data(LIST_OF_SYMBOLS)
     45 processorObj.clean_data()

1 frames
[/FinRL-Meta/meta/data_processors/binance.py](https://localhost:8080/#) in <module>
     10 import pandas as pd
     11 import requests
---> 12 from _base import check_date
     13 
     14 from meta.config import BINANCE_BASE_URL

ModuleNotFoundError: No module named '_base'
Daiiszuki commented 1 year ago

Resolved to using the yahooDownloader

Daiiszuki commented 1 year ago

On second thought. I think It was actually a finrl issue

zhumingpassional commented 1 year ago

pls change "from _base import check_date" to "from meta.data_processors._base import check_date"

we have updated the code.

Daiiszuki commented 1 year ago

This is new

-----------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-9-c727ea424a25>](https://localhost:8080/#) in <module>
----> 1 import gym

2 frames
[/usr/local/lib/python3.7/dist-packages/gym/__init__.py](https://localhost:8080/#) in <module>
     11 )
     12 from gym.spaces import Space
---> 13 from gym.envs import make, spec, register
     14 from gym import logger
     15 from gym import vector

[/usr/local/lib/python3.7/dist-packages/gym/envs/__init__.py](https://localhost:8080/#) in <module>
      8 
      9 # Hook to load plugins from entry points
---> 10 _load_env_plugins()
     11 
     12 

[/usr/local/lib/python3.7/dist-packages/gym/envs/registration.py](https://localhost:8080/#) in load_env_plugins(entry_point)
    248 def load_env_plugins(entry_point="gym.envs"):
    249     # Load third-party environments
--> 250     for plugin in metadata.entry_points().get(entry_point, []):
    251         # Python 3.8 doesn't support plugin.module, plugin.attr
    252         # So we'll have to try and parse this ourselves

AttributeError: 'EntryPoints' object has no attribute 'get'
zhumingpassional commented 1 year ago

@Daiiszuki https://stackoverflow.com/questions/73929564/entrypoints-object-has-no-attribute-get-digital-ocean

zhumingpassional commented 1 year ago

pls write the check_env() to test the env. pls refer to https://github.com/AI4Finance-Foundation/ElegantRL/blob/master/elegantrl/envs/StockTradingEnv.py