AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.38k stars 2.28k forks source link

Add portfolio optimization environment, architectures and algorithm #1146

Closed C4i0kun closed 5 months ago

C4i0kun commented 5 months ago

Hello!

This pull request adds some features so that developers and researchers can train state-of-the-art architectures for portfolio optimization agents.

  1. It introduces a portfolio optimization environment (POE) with the formulations presented in this article.
  2. It reproduces two relevant convolutional architectures that achieve great results in solving this problem. The architectures are called EIIE (ensemble of identical independent evaluators) and EI3 (ensemble of identical independent inception).
  3. It adds a policy gradient algorithm specifically developed to train portfolio optimization agents.

These three features were implemented following the FinRL API, so anyone can train an agent by doing something like below.

model = DRLAgent(environment).get_model("pg", model_kwargs, policy_kwargs)
DRLAgent.train_model(model, episodes=20)

A complete example was added to the example folders.

ahotrod commented 5 months ago

@C4i0kun

Thanks for your hardwork improving/revising your portfolio optimization environment, architectures and algorithm for integration into FinRL! Your pull request changes run fine for me on CPU, but I experience an issue while training on GPU:


RuntimeError Traceback (most recent call last) Cell In[10], line 1 ----> 1 DRLAgent.train_model(model, episodes=20)

File /media/dan/work/INVEST/0_FinRL/0_FinRL_WIP/0_FinRL_0_POE_PR/models.py:45, in DRLAgent.train_model(model, episodes) 43 @staticmethod 44 def train_model(model, episodes=100): ---> 45 model.train(episodes)

File /media/dan/work/INVEST/0_FinRL/0_FinRL_WIP/0_FinRL_0_POE_PR/algorithms.py:105, in PolicyGradient.train(self, episodes) 103 obs_batch = np.expand_dims(obs, axis=0) 104 last_action_batch = np.expand_dims(last_action, axis=0) --> 105 action = self.train_policy(obs_batch, last_action_batch) 106 self.train_pvm.add(action) 108 # run simulation step

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File /media/dan/work/INVEST/0_FinRL/0_FinRL_WIP/0_FinRL_0_POE_PR/architectures.py:104, in EIIE.forward(self, observation, last_action) 94 def forward(self, observation, last_action): 95 """Policy network's forward propagation. 96 97 Args: (...) 102 Action to be taken (numpy array). 103 """ --> 104 mu = self.mu(observation, last_action) 105 action = mu.cpu().detach().numpy().squeeze() 106 return action

File /media/dan/work/INVEST/0_FinRL/0_FinRL_WIP/0_FinRL_0_POE_PR/architectures.py:76, in EIIE.mu(self, observation, last_action) 73 last_stocks, cash_bias = self._process_last_action(last_action) 74 cash_bias = torch.zeros_like(cash_bias).to(self.device) ---> 76 output = self.sequential(observation) # shape [N, 20, PORTFOLIO_SIZE, 1] 77 output = torch.cat( 78 [last_stocks, output], dim=1 79 ) # shape [N, 21, PORTFOLIO_SIZE, 1] 80 output = self.final_convolution(output) # shape [N, 1, PORTFOLIO_SIZE, 1]

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/container.py:215, in Sequential.forward(self, input) 213 def forward(self, input): 214 for module in self: --> 215 input = module(input) 216 return input

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/conv.py:460, in Conv2d.forward(self, input) 459 def forward(self, input: Tensor) -> Tensor: --> 460 return self._conv_forward(input, self.weight, self.bias)

File ~/miniconda3/envs/dlpo/lib/python3.10/site-packages/torch/nn/modules/conv.py:456, in Conv2d._conv_forward(self, input, weight, bias) 452 if self.padding_mode != 'zeros': 453 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), 454 weight, bias, self.stride, 455 _pair(0), self.dilation, self.groups) --> 456 return F.conv2d(input, weight, bias, self.stride, 457 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same


From what I've read, it appears this occurs when the model's weights and data are not both on the GPU. Perhaps a "model.to(device)" command is needed? I haven't yet figured-out where to insert this command to test it out.

As an aside, in your original portfolio optimization example notebook I am using "features = ['close', 'effr', 'umcsent', 'unrate', 'usdx', 'vixcls']" with "_nn.Conv2d(inchannels = len(features)" and looking to iterate the downstream CNN parameters/dimensions to best capture short-term dependencies of these features. Second, I'm also looking to integrate Transformers into your model/framework for long-term dependencies, as described in this paper, "Financial Time Series Forecasting using CNN and Transformer" by Zhen Zang, et. al. (arXiv:2304.04912v1 11Apr2023). Any thoughts/comments?

Best regards!

zhumingpassional commented 5 months ago

@C4i0kun thanks for your good job. great!

C4i0kun commented 5 months ago

@ahotrod,

Sorry for the late response. I have just created another pull request solving the GPU issue. Take a look here.

About the application of transformers, I really think it can be done. You would need to create another architecture using the same inputs (financial time series), outputs (percentage of investment in each stock) and public methods (mu and forward) of the existing convolutional architectures and I believe the training algorithm will be perfectly executed! I'm not used to work with transformers but if you have any issue, you can contact me.