druce / rl

Deep Reinforcement Learning For Trading
https://alphaarchitect.com/2020/02/26/reinforcement-learning-for-trading/
104 stars 27 forks source link

Trading With RL Pickle Error - TypeError: can't pickle _thread.RLock objects #8

Open windowshopr opened 3 years ago

windowshopr commented 3 years ago

Love the work you've posted! Super thorough.

I'm trying to implement the Trading with RL notebook into Google Colab but running into an issue. I will document my moves here for reproducibility.

Basically, I copied, cell for cell, the notebook from this repository into a Google Colab notebook.

First, you have to install backtrader into the environment. I did this by putting this at the very beginning of the imports:

try:
    import backtrader as bt
except:
    print('Backtrader not installed yet. Installing now...')
    !pip install backtrader
    print('Backtrader installed.')
    print('Restart and Run All now.')
    exit()

This way, it'll prompt the user to Restart and Run All once Backtrader is installed.

Second issue I figured out was to make sure this notebook is being run with eager execution disabled, so I added:

from tensorflow.compat import v1
v1.disable_eager_execution()

...to the imports at the top as well.

Now my issue is this error:

TypeError                                 Traceback (most recent call last)
<ipython-input-21-e8262df5188b> in <module>()
     38 
     39     if e and (e+1) % agent.save_interval == 0:
---> 40         agent.save()
     41 
     42 elapsed_time = time.time() - start_time

<ipython-input-20-21179c258b19> in save(self)
    149         self.predict_model.save("%s_predict.h5" % fullname)
    150         # can't save / load train model due to custom loss
--> 151         pickle.dump(self, open("%s.p" % fullname, "wb"))
    152 
    153     def load(filename, memory=True):

TypeError: can't pickle _thread.RLock objects

This happens when the cell with this at the beginning is run (i.e. the cell after the class REINFORCE_Agent(Agent): cell) :

N_EPISODES = 2000
ticks_per_episode = 1256
nstocks = 1
lag = 1

Googling that error, I found this answer that might help? But hoping to get some help troubleshooting this one? Would greatly appreciate it as I can't wait to get this working online.

Thanks!

druce commented 3 years ago

hmmh .. .will look into further

however I think you can just comment out agent.save . I had training set up to save models for early stopping to reload the one with best metrics. but I don't think anything uses that in this notebook.

backtrader is not needed I think, I tried using it to compute backtest metrics but it's not used.

will push some small edits probably tomorrow.

On Fri, Nov 20, 2020 at 5:55 PM windowshopr notifications@github.com wrote:

Love the work you've posted! Super thorough.

I'm trying to implement the Trading with RL notebook into Google Colab but running into an issue. I will document my moves here for reproducibility.

Basically, I copied, cell for cell, the notebook from this repository into a Google Colab notebook.

First, you have to install backtrader into the environment. I did this by putting this at the very beginning of the imports:

try: import backtrader as bt except: print('Backtrader not installed yet. Installing now...') !pip install backtrader print('Backtrader installed.') print('Restart and Run All now.') exit()

This way, it'll prompt the user to Restart and Run All once Backtrader is installed.

Second issue I figured out was to make sure this notebook is being run with eager execution disabled, so I added:

from tensorflow.compat import v1 v1.disable_eager_execution()

...to the imports at the top as well.

Now my issue is this error:

TypeError Traceback (most recent call last)

in () 38 39 if e and (e+1) % agent.save_interval == 0: ---> 40 agent.save() 41 42 elapsed_time = time.time() - start_time in save(self) 149 self.predict_model.save("%s_predict.h5" % fullname) 150 # can't save / load train model due to custom loss --> 151 pickle.dump(self, open("%s.p" % fullname, "wb")) 152 153 def load(filename, memory=True): TypeError: can't pickle _thread.RLock objects This happens when the cell with this at the beginning is run (i.e. the cell after the class REINFORCE_Agent(Agent): cell) : N_EPISODES = 2000 ticks_per_episode = 1256 nstocks = 1 lag = 1 Googling that error, I found this answer that might help? But hoping to get some help troubleshooting this one? Would greatly appreciate it as I can't wait to get this working online. Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .

-- Druce Vertes StreetEYE, Inc. druce@streeteye.com @StreetEYE http://www.streeteye.com 646-543-7823 (54-DRUCE)

windowshopr commented 3 years ago

Ah that's correct, I don't see any instance of backtrader in the script, you're right. Commented out that install section for now.

Oh, and I also added in the below script into a cell just below the imports:

    # If model save directory isn't made yet, make it
    if not os.path.exists('model_output'):
        os.makedirs('model_output')
    if not os.path.exists('model_output/trading'):
        os.makedirs('model_output/trading')

... as the folder doesn't get made in Colab automatically, so this just makes the folders for you, if they don't already exist.

I commented out the pickle.save() portion of the agent.save() function and seems to be running now, saving some kreinforceXXXX_predict.h5 files. Was the pickle'ing just a backup to the regular save method? Would be awesome to make sure models are getting saved properly. Will advise if I run into anything else in mean time. :D

windowshopr commented 3 years ago

@druce I also noticed that using multiple nstocks does not seem to work in Colab either. I don't know if this functionality was just left out when the notebook was created or not, but would be cool to get that working also. When try to change the variables nstocks everywhere to 2 instead of 1, I get the error:

ValueError                                Traceback (most recent call last)
<ipython-input-24-a5939e1fbe15> in <module>()
     34     if not os.path.exists('model_output/trading'):
     35         os.makedirs('model_output/trading')
---> 36     agent.run_episode()
     37     agent.score_episode(e, N_EPISODES)
     38 

1 frames
<ipython-input-22-b8da4b29deae> in run_episode(self, render)
     67                 env.render()
     68             self.action = self.act(self.state.reshape([1, self.state_size]))
---> 69             self.next_state, self.reward, self.done, _ = env.step(self.action)
     70             self.total_reward += self.reward
     71 

<ipython-input-20-0b69e1f91c9e> in step(self, action)
     45         # map actions 0 1 2 to positions -1, 0, 1
     46         position = action - 1
---> 47         reward = position @ stock_delta
     48         self.total_reward += reward
     49         self.t += 1

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

I don't know what this error is trying to tell me lol but seems to be pointed at the position @ stock_delta step. I'm just now getting into wanting to train a model to take multiple actions at 1 step, given a state, so it would be cool to work this one out so I can apply it elsewhere as well. I read the write up on your site and it mentioned:

The Ritter paper applies reinforcement learning to a multiple-stock portfolio. This is fairly straightforward from here by changing the input to be the states from multiple stocks, adding multiple outputs for multiple stocks, and computing the reward as a portfolio return. The Ritter paper also uses Sharpe ratio as the reward, and finds that the algorithm successfully optimizes it, which is a very nice result. The model empirically maximized risk-reward without knowing anything about the stock market dynamics, a covariance matrix, normality, or even how the reward is computed.

The input to where? The environment? Same for output?

Thanks! Love working with the notebook!

druce commented 3 years ago

it's only implemented for nstocks = 1

it's 'fairly straightforward' conceptually for the most part but it's a bit of work.

the inputs for a single stock is I believe 8 periods, 2 inputs per period so 1x16

the output is binomial classification for long-only, either be long or flat

so if you wanted to do 10 stocks, just multiply everything by 10

10x16 input, 10x2 output, and multiply all the layers in between by 10 (maybe it shouldn't need to be fully connected and maybe share weights between layers)

anyway you get the idea, it's conceptually fairly straightforward to extend to multi-input, multi-output.

if you want to solve an interesting portfolio optimization you probably don't want independent asset returns, you want some correlation between them. and the point of OU or SHM type processes was to teach the algo to do market timing which complicates and muddies the portfolio optimization part.

so probably instead of OU/SHM would generate 10 (or some number) assets with random returns, such that maybe 5 are bad assets (negative returns, or 100% correlated with other assets and worse returns) and 5 good assets, fiddle with the correlations and returns so there is a known portfolio with max Sharpe at some discretized weights. and then the output is not just long/flat, it's eg 0-4 shares. And then you have to convert those outputs to % of total reflecting a budget constraint, and make the reward at each timestep the Sharpe over some lookback window for the resulting portfolio. And hopefully the algo would converge on holding the good assets in the correct weights per a portfolio optimization based on the known expected returns and correlations.

so anyway conceptually seems fairly straightforward but clearly quite a bit of work, and maybe I missed some other stumbling blocks. So ended up not implementing it.

but maybe in future! if you get anywhere with it let me know! would maybe start with a close read of the Ritter paper. although what he does seems a little simpler https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3015609

there's some code in my other repo that generates random assets with specified correlations to each other, eg a synthetic short asset with 5% negative return and 90% correlation to S&P. https://github.com/druce/portfolio_optimization

On Tue, Dec 8, 2020 at 6:16 PM windowshopr notifications@github.com wrote:

@druce https://github.com/druce I also noticed that using multiple nstocks does not seem to work in Colab either. I don't know if this functionality was just left out when the notebook was created or not, but would be cool to get that working also. When try to change the variables nstocks everywhere to 2 instead of 1, I get the error:

ValueError Traceback (most recent call last)

in () 34 if not os.path.exists('model_output/trading'): 35 os.makedirs('model_output/trading') ---> 36 agent.run_episode() 37 agent.score_episode(e, N_EPISODES) 38 1 frames in run_episode(self, render) 67 env.render() 68 self.action = self.act(self.state.reshape([1, self.state_size])) ---> 69 self.next_state, self.reward, self.done, _ = env.step(self.action) 70 self.total_reward += self.reward 71 in step(self, action) 45 # map actions 0 1 2 to positions -1, 0, 1 46 position = action - 1 ---> 47 reward = position @ stock_delta 48 self.total_reward += reward 49 self.t += 1 ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1) I don't know what this error is trying to tell me lol but seems to be pointed at the position @ stock_delta step. I'm just now getting into wanting to train a model to take multiple actions at 1 step, given a state, so it would be cool to work this one out so I can apply it elsewhere as well. I read the write up on your site and it mentioned: The Ritter paper applies reinforcement learning to a multiple-stock portfolio. This is fairly straightforward from here by changing the input to be the states from multiple stocks, adding multiple outputs for multiple stocks, and computing the reward as a portfolio return. The Ritter paper also uses Sharpe ratio as the reward, and finds that the algorithm successfully optimizes it, which is a very nice result. The model empirically maximized risk-reward without knowing anything about the stock market dynamics, a covariance matrix, normality, or even how the reward is computed. The input to where? The environment? Same for output? Thanks! Love working with the notebook! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .
windowshopr commented 3 years ago

Best answer I've received in a long time lol thank you for taking the time, it makes lots of sense now.

I am interested in that portfolio optimization problem with correlated prices. I read the Ritter paper last night. Seems interesting. I have a modified Colab notebook that takes your primary code and trims out everything not pertaining to the OU part of it, and also modified it to only allow 1 long position at a time, just to see how it performs. It still finds profit which is nice, but my main worry is this idea of mean reversion. Buying when the price is below its long term mean is a risky move as prices can easily move out of mean reversion, and keep moving down. I've also added in the Hurst exponent into that above notebook, so I may want to play around with incorporating that into the correlation stuff you mentioned, such that it will only trade the mean reversion when it starts entering a lower Hurst exponent, hopefully taking advantage of the short term mean reversion. Maybe?

I'll also check out your other repository next weekend and see what I can come up with. Thanks a lot for the insight, and I'll let you know if I come up with anything!