Open Daiiszuki opened 1 year ago
Also, may please provide a link to the finrl discord , telegram or similar if it exists
In relation to this line from the FinRL_PortfolioAllocation_NeurIPS_2020.ipynb tutorial:
self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
using my implementation, for example, the shape ofnp.array(self.covs)
will be 6,6
[[0.0031702 0.00174953 0.00141159 0.00196729 0.00234947 0.00219413]
[0.00174953 0.00200029 0.0007653 0.00131069 0.0015602 0.00137297]
[0.00141159 0.0007653 0.00102168 0.00109734 0.00125215 0.00125944]
[0.00196729 0.00131069 0.00109734 0.00271591 0.00192142 0.00169665]
[0.00234947 0.0015602 0.00125215 0.00192142 0.00240365 0.00197903]
[0.00219413 0.00137297 0.00125944 0.00169665 0.00197903 0.00207293]]
[25]
While[self.data[tech].values.tolist() for tech in self.tech_indicator_list ]
returns (7812,)
@Daiiszuki Thanks for your detailed report! We are looking into it.
@Daiiszuki Here is the discord link
@Athe-kunal Thank you, I appreciate it
@XiaoYangLiu-FinRL
self.observation_space = spaces.Box( low=np.inf, high = np.inf, shape = ( self.state_space,6))
, question is, why does dummy_vec_env request for (14,6)? I'm very new to stablebaseline so I may be misissing something on that front AttributeError: 'crytoEnv' has no variable 'reward'
(because the reward variable was not initialised?).
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ INTITIAL PORTFOLIO VALUE:1000000 END PORTFOLIO VALUE:1000000 SHARPE nan +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Also, may you please explain the sharpe ratio calculation
if df_daily_return['daily_return'].std() !=0:
sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
df_daily_return['daily_return'].std()
print("Sharpe: ",sharpe)
specifically this syntax/ \
If 252 is the number of trading days, would 356 be more appropriate appropriate for crypto?
I tested the notebook and it works https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb.
I recommend you to paste all the code here.
My environment:
class cryptoPortfolioAllocationEnvironment(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self,
dataFrame,
cryptoDimension,
maxCryptos,
startingCapital,
transactionFeePercentage,
agentRewardFactor,
state_space,
action_space,
indicatorList,
turbulance=None,
lookbackPeriod=252,
dateIncriment = 0):
self.dateIncriment = dateIncriment
self.lookbackPeriod = lookbackPeriod
self.dataFrame = dataFrame
self.cryptoDimension = cryptoDimension
self.maxCryptos = maxCryptos
self.startingCapital = startingCapital
self.transactionFeePercentage = transactionFeePercentage
self.agentRewardFactor = agentRewardFactor
self.state_space = state_space
self.action_space = action_space
self.indicatorList = indicatorList
self.turbulance=None
self.action_space = spaces.Box(low = 0, high = 1, shape=(self.action_space,))
self.observation_space = spaces.Box( low=np.inf, high = np.inf, shape = ( self.state_space + len(indicatorList), self.state_space))
self.frame = self.dataFrame.loc[[self.dateIncriment]]
self.covarianceList = self.frame['covarianceList'].values[0]
self.state = np.append(
self.covarianceList,
[self.frame[indic].values.tolist() for indic in self.indicatorList],
axis=0
)
self.term =False
self.turbulance=turbulance
self.portfolioValue = self.startingCapital
self.portfolioHistory = [self.startingCapital]
self.portFolioReturnHistory = [0]
self.actionHistory = [[[1/cryptoDimension] * cryptoDimension]]
self.dateHistory = [self.frame.date.unique()[0]]
def normaliseSoftmax(self, actions):
expNumerator = np.exp(actions)
expDenominator = np.sum(np.exp(actions))
output = expNumerator/ expDenominator
return output
def step(self, actions):
self.term = self.dateIncriment >= len(self.frame.index.unique())-1
print(self.dateIncriment)
if self.term:
dataFrame = panda.DataFrame(self.portFolioReturnHistory)
dataFrame.columns = ['listOfReturns']
plt.plot(dataFrame.listOfReturns.cumsum(),'r')
if not os.path.exists("./TRAINED_MODEL_OUTPUT"):
os.makedirs("./" + 'TRAINED_MODEL_OUTPUT"')
plt.savefig('./TRAINED_MODEL_OUTPUT"')
plt.close()
plt.plot(self.portFolioReturnHistory,'r')
plt.savefig('TRAINED_MODEL_OUTPUT/reward.png')
plt.close()
print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(f'INTITIAL PORTFOLIO VALUE:{self.portfolioHistory[0]}')
print(f'END PORTFOLIO VALUE:{self.portfolioValue}')
returnDf = panda.DataFrame(self.portFolioReturnHistory)
returnDf.columns = ['listOfReturns']
if returnDf['listOfReturns'].std() !=0:
sharpePrint = (252*0.5)*returnDf['listOfReturns'].mean()/ \
returnDf['listOfReturns'].std()
print("SHARPE", sharpePrint)
print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
return self.state, self.reward, self.term, {}
else:
portfolioWeight = self.normaliseSoftmax(actions)
self.actionHistory.append(portfolioWeight)
recentHistory = self.dataFrame
#Increase frame by 1
self.dateIncriment +=1
self.frame = self.dataFrame.loc[[self.dateIncriment]]
self.covarianceList = self.frame['covarianceList'].values[0]
self.state = np.append(np.array(self.covarianceList), [self.frame[indicator].values.tolist() for indicator in self.indicatorList], axis=0)
portfolioReturn = sum(
((self.frame.close.values / recentHistory.close.values)-1)*portfolioWeight
)
updatePortfolioVal = self.portfolioValue *(1+portfolioReturn)
self.portfolioValue = updatePortfolioVal
self.portFolioReturnHistory.append(portfolioReturn)
self.dateHistory.append(self.frame.date.unique()[0])
self.portfolioHistory.append(updatePortfolioVal)
self.reward = updatePortfolioVal
return self.state, self.reward, self.term, {}
def reset(self):
self.portfolioHistory = [self.startingCapital]
self.dateIncriment = 0
self.frame = self.dataFrame.loc[[self.dateIncriment]]
self.covarianceList = self.frame['covarianceList'].values[0]
self.state = np.append(np.array(self.covarianceList), [self.frame[indicator].values.tolist() for indicator in self.indicatorList], axis=0)
self.portFolioReturnHistory = [0]
self.term =False
self.portfolioValue = self.startingCapital
self.actionHistory = [[[1/cryptoDimension] * cryptoDimension]]
self.dateHistory = [self.frame.date.unique()[0]]
return self.state
def renderMode(self, mode='human'):
return self.state
def savePortfolioHist(self):
dateSave = self.dateHistory
returnSave = self.portFolioReturnHistory
returnFrame = panda.Dataframe({'date':dateSave, 'return': returnSave})
return returnFrame
def saveActionHistory(self):
dates = self.dateHistory
frameDate = panda.DataFrame(dates)
frameDate.columns = ['date']
actions = self.actionHistory
frameActions = panda.DataFrame(actions)
frameActions.columns = self.frame.tic.values
frameActions.index = frameDate.date
return frameActions
#Nesscessary?
def render(self, seedFactor=None):
self.randSeed, seedFactor = seeding.np_random(seedFactor)
return [seedFactor]
def stableBaselineEnv(self):
sb = DummyVecEnv([lambda: self])
obs = sb.reset()
return sb, obs
could you tell me where the test code for this env is?
No issue here
INDICATOR_LIST = ["boll_ub",
"boll_lb",
'rsi',
'cci',
'macd',
'dx',
"close_20_sma",
"close_60_sma",
]
cryptoDimension = len(pickleFrame.tic.unique())
state_space =cryptoDimension
print(f'CRYPTO DIMENSION: {cryptoDimension}, STATE SPACE: {state_space}')
envKwargs = {
"maxCryptos": 100,
"startingCapital": 1000000,
"transactionFeePercentage": 0.001,
"state_space": state_space,
"cryptoDimension": cryptoDimension,
"indicatorList": INDICATOR_LIST,
"action_space": cryptoDimension,
"agentRewardFactor": 1e-4
}
cryptoEnv = cryptoPortfolioAllocationEnvironment(dataFrame=trainData, **envKwargs)
Running activeEnv, _ = cryptoEnv.stableBaselineEnv()
causes the error
Why/how is it that dummyvenEnv expects (14,6)?
I am wondering if you can send all the code to me. some variables, e.g., pickleFrame is not defined. It does not work.
The rest of the code is to with the data acquisition (In a different notebook) using the following params:
#Set constants
LIST_OF_SYMBOLS = ['ADAUSDT' ,'ATOMUSDT' ,'BNBUSDT', 'BTCUSDT' ,'DOTUSDT' ,'ETCUSDT', 'ETHUSDT','LINKUSDT', 'LTCUSDT' ,'SOLUSDT' ,'XMRUSDT' ,'BCHUSDT', 'MATICUSDT', 'DAIUSDT']
#Set time interval
TIME_INTERVAL = '1d'
#Training start
START_TRAIN = '2018-01-01'
#Training end
END_TRAIN = '2020-12-01'
#Trading start
START_TRADE = '2020-12-01'
#Trading end
END_TRADE = '2022-06-01'
#List of technical indicators
TECHNICAL_INDICATORS = ["boll_ub",
"boll_lb",
'rsi',
'cci',
'macd',
'dx',
"close_20_sma",
"close_60_sma",
]
if_vix = False
processorObj = DataProcessor(data_source = 'binance', start_date= START_TRAIN, end_date =END_TRAIN, time_interval=TIME_INTERVAL )
processorObj.download_data(LIST_OF_SYMBOLS)
processorObj.clean_data()
processorObj.add_technical_indicator(TECHNICAL_INDICATORS)
frame = processorObj.dataframe
pickleframe is``` pickleFrame = panda.read_pickle(r'/content/drive////cleanData.pkl') pickleFrame.columns
Index(['index', 'tic', 'time', 'open', 'high', 'low', 'close', 'adjusted_close', 'volume', 'boll_ub', 'boll_lb', 'rsi', 'cci', 'macd', 'dx', 'close_20_sma', 'close_60_sma', 'covarianceList', 'listOfReturns'], dtype='object')
The link to the data sourcing notebook https://colab.research.google.com/drive/1x4wwYruxjCoF-AVAR6sM_hXgCRf8lctM?usp=sharing
Training:
https://colab.research.google.com/drive/1x4wwYruxjCoF-AVAR6sM_hXgCRf8lctM?usp=sharing
Perhaps I could just move onto a different implementation? the matter has become urgent
sorry delayed response. I am busy these days. I think you should check the data again. Make sure that the shape of data and RL's state coincide.
Please elaborate on the shape. What would be the required shape in this case for example, and how/why is it determined by the dummyvecenv/ as (14,6)?
Chaning the ticker list toa length of 12 and the indicator list to 6 seems to solve the issue. But as it was before, no learning is happening
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
cryptoEnv.state.shape returns (12, 6) cryptoEnv.state_space returns 12 cryptoEnv.observation_space returns (12, 6)
does the data have nan? If yes, you should fill nan with values
No nan values, as in the notebook
It looks like an issue to do with the reward returned from the step function. The variable was not initialised outside the class. My quick workaround was to define the reward with the other class vars
What do you recommend?
Also, I think nan values would cause an error, no?
I was asking, last time, about this sharpe calculation and /\
syntax
def step(self, actions):
self.term = self.dateIncriment >= len(self.frame.index.unique())-1
if self.term:
dataFrame = panda.DataFrame(self.portFolioReturnHistory)
dataFrame.columns = ['listOfReturns']
plt.plot(dataFrame.listOfReturns.cumsum(),'r')
if not os.path.exists("./TRAINED_MODEL_OUTPUT"):
os.makedirs("./" + 'TRAINED_MODEL_OUTPUT"')
plt.savefig('./TRAINED_MODEL_OUTPUT"')
plt.close()
plt.plot(self.portFolioReturnHistory,'r')
plt.savefig('TRAINED_MODEL_OUTPUT/reward.png')
plt.close()
print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
print(f'INTITIAL PORTFOLIO VALUE:{self.portfolioHistory[0]}')
print(f'END PORTFOLIO VALUE:{self.portfolioValue}')
returnDf = panda.DataFrame(self.portFolioReturnHistory)
returnDf.columns = ['listOfReturns']
if returnDf['listOfReturns'].std() !=0:
sharpePrint = (252*0.5)*returnDf['listOfReturns'].mean()/ \
returnDf['listOfReturns'].std()
print("SHARPE", sharpePrint)
print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
return self.state, self.reward, self.term, {}
else:
portfolioWeight = self.normaliseSoftmax(actions)
self.actionHistory.append(portfolioWeight)
recentHistory = self.dataFrame
#Increase frame by 1
self.dateIncriment +=1
self.frame = self.dataFrame.loc[[self.dateIncriment]]
self.covarianceList = self.frame['covarianceList'].values[0]
self.state = np.append(np.array(self.covarianceList), [self.frame[indicator].values.tolist() for indicator in self.indicatorList], axis=0)
portfolioReturn = sum(
((self.frame.close.values / recentHistory.close.values)-1)*portfolioWeight
)
updatePortfolioVal = self.portfolioValue *(1+portfolioReturn)
self.portfolioValue = updatePortfolioVal
self.portFolioReturnHistory.append(portfolioReturn)
self.dateHistory.append(self.frame.date.unique()[0])
self.portfolioHistory.append(updatePortfolioVal)
self.reward = updatePortfolioVal
return self.state, self.reward, self.term, {}
Also, Attempting to run cryptoEnv.saveActionHistory() returns
ValueError Traceback (most recent call last)
[<ipython-input-48-2462bb954d50>](https://localhost:8080/#) in <module>
----> 1 cryptoEnv.saveActionHistory()
5 frames
[<ipython-input-38-83dc1ec2cd31>](https://localhost:8080/#) in saveActionHistory(self)
179 actions = self.actionHistory
180 frameActions = panda.DataFrame(actions)
--> 181 frameActions.columns = self.frame.tic.values
182 frameActions.index = frameDate.date
183
[/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in __setattr__(self, name, value)
5498 try:
5499 object.__getattribute__(self, name)
-> 5500 return object.__setattr__(self, name, value)
5501 except AttributeError:
5502 pass
/usr/local/lib/python3.7/dist-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
[/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in _set_axis(self, axis, labels)
764 def _set_axis(self, axis: int, labels: Index) -> None:
765 labels = ensure_index(labels)
--> 766 self._mgr.set_axis(axis, labels)
767 self._clear_item_cache()
768
[/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py](https://localhost:8080/#) in set_axis(self, axis, new_labels)
214 def set_axis(self, axis: int, new_labels: Index) -> None:
215 # Caller is responsible for ensuring we have an Index object.
--> 216 self._validate_set_axis(axis, new_labels)
217 self.axes[axis] = new_labels
218
[/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py](https://localhost:8080/#) in _validate_set_axis(self, axis, new_labels)
56 elif new_len != old_len:
57 raise ValueError(
---> 58 f"Length mismatch: Expected axis has {old_len} elements, new "
59 f"values have {new_len} elements"
60 )
ValueError: Length mismatch: Expected axis has 1 elements, new values have 6 elements
def saveActionHistory(self):
dates = self.dateHistory
frameDate = panda.DataFrame(dates)
frameDate.columns = ['date']
actions = self.actionHistory
frameActions = panda.DataFrame(actions)
frameActions.columns = self.frame.tic.values
frameActions.index = frameDate.date
return frameActions
the step function should obey the principles of openai gym. pls read the env of finrl or openai. your env does not work.
@zhumingpassional this step function was actually inspired by the function was in the introductory notebook:
def step(self, actions):
# print(self.day)
self.terminal = self.day >= len(self.df.index.unique())-1
# print(actions)
if self.terminal:
df = pd.DataFrame(self.portfolio_return_memory)
df.columns = ['daily_return']
plt.plot(df.daily_return.cumsum(),'r')
plt.savefig('results/cumulative_reward.png')
plt.close()
plt.plot(self.portfolio_return_memory,'r')
plt.savefig('results/rewards.png')
plt.close()
print("=================================")
print("begin_total_asset:{}".format(self.asset_memory[0]))
print("end_total_asset:{}".format(self.portfolio_value))
df_daily_return = pd.DataFrame(self.portfolio_return_memory)
df_daily_return.columns = ['daily_return']
if df_daily_return['daily_return'].std() !=0:
sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
df_daily_return['daily_return'].std()
print("Sharpe: ",sharpe)
print("=================================")
return self.state, self.reward, self.terminal,{}
else:
#print("Model actions: ",actions)
# actions are the portfolio weight
# normalize to sum of 1
#if (np.array(actions) - np.array(actions).min()).sum() != 0:
# norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
#else:
# norm_actions = actions
weights = self.softmax_normalization(actions)
#print("Normalized actions: ", weights)
self.actions_memory.append(weights)
last_day_memory = self.data
#load next state
self.day += 1
self.data = self.df.loc[self.day,:]
self.covs = self.data['cov_list'].values[0]
self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
#print(self.state)
# calcualte portfolio return
# individual stocks' return * weight
portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
# update portfolio value
new_portfolio_value = self.portfolio_value*(1+portfolio_return)
self.portfolio_value = new_portfolio_value
# save into memory
self.portfolio_return_memory.append(portfolio_return)
self.date_memory.append(self.data.date.unique()[0])
self.asset_memory.append(new_portfolio_value)
# the reward is the new portfolio value or end portfolo value
self.reward = new_portfolio_value
#print("Step reward: ", self.reward)
#self.reward = self.reward*self.reward_scaling
return self.state, self.reward, self.terminal, {}
I notticed that you have closed the issue, but please, I would appreciate if you could at least point me in the right direction
In the notebook, the reward variable is defined in the step function, hence, I'm getting 'cryptoenv object has no attribute 'reward'
I noticed that then performing the covar matrix calculation, the resulting matrix was 6x6 even though I had 10-12 tickers. This happened because the data acquisition did not start at the same day. Eg btc ltc data from 2018 was available, but some cryptos such as DOt only had observations starting after the period. Picking only the full range of available tickers worked, But now I get
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-62-3b90e9477704>](https://localhost:8080/#) in <module>
1 traderAgent.train_model(model=PPOModel,
2 tb_log_name='ppo',
----> 3 total_timesteps=500)
7 frames
[/usr/local/lib/python3.7/dist-packages/finrl/agents/stablebaselines3/models.py](https://localhost:8080/#) in train_model(self, model, tb_log_name, total_timesteps)
104 total_timesteps=total_timesteps,
105 tb_log_name=tb_log_name,
--> 106 callback=TensorboardCallback(),
107 )
108 return model
[/usr/local/lib/python3.7/dist-packages/stable_baselines3/ppo/ppo.py](https://localhost:8080/#) in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
317 tb_log_name=tb_log_name,
318 eval_log_path=eval_log_path,
--> 319 reset_num_timesteps=reset_num_timesteps,
320 )
[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/on_policy_algorithm.py](https://localhost:8080/#) in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
245 while self.num_timesteps < total_timesteps:
246
--> 247 continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
248
249 if continue_training is False:
[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/on_policy_algorithm.py](https://localhost:8080/#) in collect_rollouts(self, env, callback, rollout_buffer, n_rollout_steps)
173 clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)
174
--> 175 new_obs, rewards, dones, infos = env.step(clipped_actions)
176
177 self.num_timesteps += env.num_envs
[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py](https://localhost:8080/#) in step(self, actions)
160 """
161 self.step_async(actions)
--> 162 return self.step_wait()
163
164 def get_images(self) -> Sequence[np.ndarray]:
[/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/vec_env/dummy_vec_env.py](https://localhost:8080/#) in step_wait(self)
42 for env_idx in range(self.num_envs):
43 obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
---> 44 self.actions[env_idx]
45 )
46 if self.buf_dones[env_idx]:
[<ipython-input-56-3436960a628c>](https://localhost:8080/#) in step(self, actions)
157 self.data = self.df.loc[[self.day]]
158 self.covs = self.data['covarianceList'].values[0]
--> 159 self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
160 #print(self.state)
161 # calcualte portfolio return
<__array_function__ internals> in append(*args, **kwargs)
[/usr/local/lib/python3.7/dist-packages/numpy/lib/function_base.py](https://localhost:8080/#) in append(arr, values, axis)
4815 values = ravel(values)
4816 axis = arr.ndim-1
-> 4817 return concatenate((arr, values), axis=axis)
4818
4819
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 7 and the array at index 1 has size 6
when I run
traderAgent.train_model(model=PPOModel,
tb_log_name='ppo',
total_timesteps=500)
Even though
testfin = StockPortfolioEnv(df=trainData, **envKwargs)
testfin.covs
returns
array([[0.00322093, 0.00197096, 0.00160227, 0.00266498, 0.00238231,
0.00275962],
[0.00197096, 0.00234589, 0.00115438, 0.0018874 , 0.00164733,
0.00169181],
[0.00160227, 0.00115438, 0.0011635 , 0.00151229, 0.00143134,
0.00148325],
[0.00266498, 0.0018874 , 0.00151229, 0.00279166, 0.00225945,
0.00224763],
[0.00238231, 0.00164733, 0.00143134, 0.00225945, 0.00226624,
0.0020566 ],
[0.00275962, 0.00169181, 0.00148325, 0.00224763, 0.0020566 ,
0.00313031]])
[60]
0s
When the last line is run again after calling the train_model, the ouput generated is
array([[nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan]])
Which is 7x7
Question is, why?
Side. note: The above issue was i use StockTradinEnv.. The shape of the COVS was being changed
Using my env however, the training starts when I initialise reward to 0 , with nan sharpe output and no change in portfolio values.I tried to let it train for some time and got
valueerror: expected parameter loc (tensor of shape (128, 6)) of distribution normal(loc: torch.size([128, 6]), scale: torch.size([128, 6])) to satisfy the constraint real(), but found invalid values: tensor([[nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan]],
How would I return an observation in the reset function such that it aligns with the _last_obs
dict of
dummyvecenv
@Daiiszuki You should check the env including step() and reset(), and make sure the shape of state is always correct at each step, and the state changes to the correct state under an action. I suggest you to write a very simple env and make sure it works.
If they are correct, the RL algorithm will obtain a policy, and we can discuss the next.
Thanks, that's kinda helpful, but still a bit vague
How would a "simple env" look like in this context ?
What issues can you see in my step and reset? Which is based on the the portfolio allocation notebook
And I tried printing self.state at each step :
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(12, 6)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(12, 6)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
(12, 6)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INTITIAL PORTFOLIO VALUE:1000000
END PORTFOLIO VALUE:1000000
SHARPE nan
A sample state looks like this
[[ 3.22092570e-03 1.97095890e-03 1.60226609e-03 2.66498489e-03
2.38231480e-03 2.75962130e-03]
[ 1.97095890e-03 2.34588985e-03 1.15437913e-03 1.88740120e-03
1.64732788e-03 1.69181107e-03]
[ 1.60226609e-03 1.15437913e-03 1.16349898e-03 1.51228729e-03
1.43133681e-03 1.48325483e-03]
[ 2.66498489e-03 1.88740120e-03 1.51228729e-03 2.79165952e-03
2.25944713e-03 2.24762622e-03]
[ 2.38231480e-03 1.64732788e-03 1.43133681e-03 2.25944713e-03
2.26624196e-03 2.05660397e-03]
[ 2.75962130e-03 1.69181107e-03 1.48325483e-03 2.24762622e-03
2.05660397e-03 3.13031382e-03]
[ 4.82398738e-02 6.43640807e+00 4.12783361e+03 1.62888621e+02
3.42617669e+01 1.29988871e-01]
[ 3.10171262e-02 4.88481193e+00 3.38948439e+03 8.50473792e+01
2.69032331e+01 1.00539129e-01]
[ 5.50679421e+01 5.37318170e+01 4.88418440e+01 6.30237320e+01
5.25227431e+01 4.34952341e+01]
[ 9.90493210e-04 9.50464597e-02 -4.66502410e+01 8.93967559e+00
2.41152967e-01 -6.74114990e-03]
[ 3.96285000e-02 5.66061000e+00 3.75865900e+03 1.23968000e+02
3.05825000e+01 1.15264000e-01]
[ 4.53968333e-02 6.21240667e+00 4.34476267e+03 1.33744167e+02
3.41323333e+01 1.60224833e-01]]
After weeks of testing different solutions, I Finally got the problem sorted.
Before I assumed that it was something to do with the DataPreprocessor so I tried using the Yahhoo preprocessor to download the data. (NOT)
Then tried using my env in the tutorial notebook and everything checked out, the model training occurs. so I concluded that it wasn't the issues
As noted earlier, in my project, the model training and data acquisition are done in seperate notebooks
pd.to_csv()
was used to export the data and pd.read_csv(df,index_col =False )
so it seemed as though the issue is as a result of the one of the two.
The shape of the imported df was verified, but when passed to the env
, a different shape is produced by df.loc[self.day,:]
What could some potential causes/solutions?
@XiaoYangLiu-FinRL
Right,.
closing this as it's clearly not a finRL issue.
Cheers @XiaoYangLiu-FinRL. valuable learning experience.
@Daiiszuki Sure. Hope you enjoy!
@XiaoYangLiu-FinRL I think this is the part where you offer me a job lol
Overnight
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
[<ipython-input-29-a96b7571d55f>](https://localhost:8080/#) in <module>
41 ]
42
---> 43 processorObj = DataProcessor(data_source = 'binance', start_date= START_TRAIN, end_date =END_TRADE, time_interval=TIME_INTERVAL )
44 processorObj.download_data(LIST_OF_SYMBOLS)
45 processorObj.clean_data()
1 frames
[/FinRL-Meta/meta/data_processors/binance.py](https://localhost:8080/#) in <module>
10 import pandas as pd
11 import requests
---> 12 from _base import check_date
13
14 from meta.config import BINANCE_BASE_URL
ModuleNotFoundError: No module named '_base'
Resolved to using the yahooDownloader
On second thought. I think It was actually a finrl issue
pls change "from _base import check_date" to "from meta.data_processors._base import check_date"
we have updated the code.
This is new
-----------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-9-c727ea424a25>](https://localhost:8080/#) in <module>
----> 1 import gym
2 frames
[/usr/local/lib/python3.7/dist-packages/gym/__init__.py](https://localhost:8080/#) in <module>
11 )
12 from gym.spaces import Space
---> 13 from gym.envs import make, spec, register
14 from gym import logger
15 from gym import vector
[/usr/local/lib/python3.7/dist-packages/gym/envs/__init__.py](https://localhost:8080/#) in <module>
8
9 # Hook to load plugins from entry points
---> 10 _load_env_plugins()
11
12
[/usr/local/lib/python3.7/dist-packages/gym/envs/registration.py](https://localhost:8080/#) in load_env_plugins(entry_point)
248 def load_env_plugins(entry_point="gym.envs"):
249 # Load third-party environments
--> 250 for plugin in metadata.entry_points().get(entry_point, []):
251 # Python 3.8 doesn't support plugin.module, plugin.attr
252 # So we'll have to try and parse this ourselves
AttributeError: 'EntryPoints' object has no attribute 'get'
pls write the check_env() to test the env. pls refer to https://github.com/AI4Finance-Foundation/ElegantRL/blob/master/elegantrl/envs/StockTradingEnv.py
In refernce to https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb
The train data is shaped as
(7812, 19)
Passing the data to the env runs without any errors
cryptoEnv = cryptoPortfolioAllocationEnvironment(dataFrame=trainData, **envKwargs)
And when I call
cryptoEnv.observation_space
, the shape is (22, 14), which assume is a combination of the price and indicators:14 tickers, 8 indicators
running
activeEnv, _ = cryptoEnv.stableBaselineEnv()
returns
What am I missing ?
Please let me know if you require any additional info.The function to generate the env is as follows