AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.4k stars 2.29k forks source link

Error occurs in PaperTrading_Demo notbook while running on Colab #1011

Open jiahau3 opened 1 year ago

jiahau3 commented 1 year ago

Describe the bug The bugs occur in Part 2 Train the agent while running train cell. The error message is ValueError: Too many values to unplack (expected 4) occurs in function explore_env. Here is the message:

ValueError Traceback (most recent call last) in <cell line: 1>() ----> 1 train(start_date = '2022-08-25', 2 end_date = '2022-08-31', 3 ticker_list = ticker_list, 4 data_source = 'alpaca', 5 time_interval= '1Min',

3 frames in train(start_date, end_date, ticker_list, data_source, time_interval, technical_indicator_list, drl_lib, env, model_name, if_vix, **kwargs) 57 ) 58 model = agent.get_model(model_name, model_kwargs=erl_params) ---> 59 trained_model = agent.train_model( 60 model=model, cwd=cwd, total_timesteps=break_step 61 )

in train_model(self, model, cwd, total_timesteps) 74 model.cwd = cwd 75 model.break_step = total_timesteps ---> 76 train_agent(model) 77 78 @staticmethod

in train_agent(args) 315 torch.set_grad_enabled(False) 316 while True: # start training --> 317 buffer_items = agent.explore_env(env, args.horizon_len) 318 319 torch.set_grad_enabled(True)

in explore_env(self, env, horizon_len) 200 201 ary_action = convert(action).detach().cpu().numpy() --> 202 arystate, reward, done, = env.step(ary_action) 203 if done: 204 ary_state = env.reset()

ValueError: too many values to unpack (expected 4)

The following is the process for how this error comes out. I think it is from step 3 where I did wrong.

There were some bugs occurred while running on Colab. Those were

  1. no attribute of pd.Timedelta(str).delta in /usr/local/lib/python3.10/site-packages/finrl/meta/data_processors/processor_alpaca.py -> Fixed by revising it to pd.Timedelta(str).value
  2. ValueError: Parameter start received with timezone defined as 'UTC' although a Date must be timezone naive. -> Fixed by adding this line in 376 ts = ts.tz_localize(None) to /usr/local/lib/python3.10/site-packages/exchange_calendars/calendar_helpers.py
  3. TypeError: tuple indices must be integers or slices, not tuple -> Fixed by deleting [np.newaxis, :] in line 309 agent.states = env.reset()[np.newaxis, :]

Not sure how to debug this error. Also, welcome any suggestions for these debugging approaches.

wjdunham commented 1 year ago

Item 3. is due to a change in the Gym framework environment reset() method adding the "info" dict. to the return array making the return value a tuple. can be fixed by changing agent.states = env.reset()[np.newaxis, :] to agent.states = env.reset()[0][np.newaxis, :]

wjdunham commented 1 year ago

there is another issue with the env.step() method which has also been updated

shivesh-pandey commented 1 year ago

Getting ValueError: too many values to unpack (expected 4)

shivesh-pandey commented 1 year ago

there is another issue with the env.step() method which has also been updated

In new version of gymnasium env.step() return 5 paramters comprative to V21

wjdunham commented 1 year ago

Here is the update I used for the env.step() change arystate, reward, terminated, , info = env.step(ary_action) but there is now a new issue - there is no "done" flag returned, has been replaced by "terminated" - getting an error farther down with what looks like a corrupted "ary_state" which may be due to the interpretation of "termiated" vs. "done" being different The old code has: if done: ary_state = env.reset()

I updated to if terminated: ary_state = env.reset()

which I am not sure is correct All of this was working several weeks ago, and the updates to Gym happened in 2022 so I am not sure what exactly got updated where to make it stop working

wjdunham commented 1 year ago

This is the necessary update as Gym env.reset returns a dict with the array and "info" as well, we need to pull the array out: The old code has: if done: ary_state = env.reset()

I updated to - if terminated: ary_state = env.reset()[0]

shivesh-pandey commented 1 year ago

This is the necessary update as Gym env.reset returns a dict with the array and "info" as well, we need to pull the array out: The old code has: if done: ary_state = env.reset()

I updated to - if terminated: ary_state = env.reset()[0]

Yeah it works now but while training I start getting error of:

:391: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:245.) tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) [](https://localhost:8080/#) in () ----> 1 train(start_date = '2022-08-25', 2 end_date = '2022-08-31', 3 ticker_list = ticker_list, 4 data_source = 'alpaca', 5 time_interval= '1Min', 5 frames [](https://localhost:8080/#) in get_rewards_and_steps(env, actor, if_render) 389 cumulative_returns = 0.0 # sum of rewards in an episode 390 for episode_steps in range(12345): --> 391 tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0) 392 tensor_action = actor(tensor_state) 393 action = tensor_action.detach().cpu().numpy()[0] # not need detach(), because using torch.no_grad() outside ValueError: expected sequence of length 333 at dim 1 (got 0)
acegla commented 1 year ago

same

shivesh-pandey commented 1 year ago

same

I think I found the solution: state = env.reset()[0] state, reward, terminated, _, info = env.step(action) state = environment.reset()[0] In many places, you will see a reset code; when you get the error, see if you have reset on top of it, then index it with 0. like this state = environment.reset()[0]

After this LOC you will get error but everywhere you need the same fix.

Dharmendra-G-1 commented 10 months ago

Hello @shivesh-pandey, Can you please share your git repo for FinRL which you were able to run successfully after correcting for the above code line you recommended?

image

Also if possible, please share all packages version (pip list) in your environment.

We are still struggling to make this FinRL_PaperTrading_Demo.ipynb works in current state.

Thanks for your help !!

svarmo commented 10 months ago

Ok, this worked for me: Changed some code in meta/paper_trading/common.py not sure if I also changed meta/env_stock_trading/env_stocktrading_np.py but I added it anyways

https://gist.github.com/svarmo/1d66b92073f2a234ed6488ccb0d780db

pip list

finrl                         0.3.6
svarmo commented 10 months ago

Btw. i got it to work on the FinRL_PaperTrading_Demo_refactored.py the Jupyter notebook is still failing for some other issue

SUSHANTH009 commented 8 months ago

This is the necessary update as Gym env.reset returns a dict with the array and "info" as well, we need to pull the array out: The old code has: if done: ary_state = env.reset() I updated to - if terminated: ary_state = env.reset()[0]

Yeah it works now but while training I start getting error of:

:391: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:245.)

tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0) ValueError Traceback (most recent call last) in <cell line: 1>() ----> 1 train(start_date = '2022-08-25', 2 end_date = '2022-08-31', 3 ticker_list = ticker_list, 4 data_source = 'alpaca', 5 time_interval= '1Min',

5 frames in get_rewards_and_steps(env, actor, if_render) 389 cumulative_returns = 0.0 # sum of rewards in an episode 390 for episode_steps in range(12345): --> 391 tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0) 392 tensor_action = actor(tensor_state) 393 action = tensor_action.detach().cpu().numpy()[0] # not need detach(), because using torch.no_grad() outside

ValueError: expected sequence of length 333 at dim 1 (got 0)

did u get solution for this?

davidatBGU commented 7 months ago

Ok, this worked for me: Changed some code in meta/paper_trading/common.py not sure if I also changed meta/env_stock_trading/env_stocktrading_np.py but I added it anyways

https://gist.github.com/svarmo/1d66b92073f2a234ed6488ccb0d780db

pip list

finrl                         0.3.6

Thank you, I am getting the following output, which I received with the notebook as well:

TLDR; the error is "RuntimeError: could not create a primitive descriptor for a matmul primitive" Could you please direct me, what might be the issue?

full output:

/home/opc/.local/lib/python3.10/site-packages/pyfolio/pos.py:26: UserWarning: Module "zipline.assets" not found; mutltipliers will not be applied to position notionals. warnings.warn( TRAIN_START_DATE: 2023-11-03 TRAIN_END_DATE: 2023-11-10 TEST_START_DATE: 2023-11-13 TEST_END_DATE: 2023-11-14 TRAINFULL_START_DATE: 2023-11-03 TRAINFULL_END_DATE: 2023-11-14 Alpaca successfully connected Data cleaning started align start and end dates produce full timestamp index Start processing tickers ticker list complete Start concat and rename Data clean finished! Started adding Indicators Running Loop Restore Timestamps Finished adding Indicators Data cleaning started align start and end dates produce full timestamp index Start processing tickers ticker list complete Start concat and rename Data clean finished!

| step: Number of samples, or total training steps, or running times of env.step(). | time: Time spent from the start of training to this moment. | avgR: Average value of cumulative rewards, which is the sum of rewards in an episode. | stdR: Standard dev of cumulative rewards, which is the sum of rewards in an episode. | avgS: Average of steps in an episode. | objC: Objective of Critic network. Or call it loss function of critic network. | objA: Objective of Actor network. It is the average Q value of the critic network. | step time | avgR stdR avgS | objC objA Traceback (most recent call last): File "/home/opc/FinRL/FinRL_PaperTrading_Demo_refactored.py", line 80, in train( File "/home/opc/FinRL/finrl/meta/paper_trading/common.py", line 752, in train trained_model = agent.train_model( File "/home/opc/FinRL/finrl/meta/paper_trading/common.py", line 636, in train_model train_agent(model) File "/home/opc/FinRL/finrl/meta/paper_trading/common.py", line 441, in train_agent logging_tuple = agent.update_net(buffer_items) File "/home/opc/FinRL/finrl/meta/paper_trading/common.py", line 318, in update_net values = [self.cri(states[i : i + bs]) for i in range(0, buffer_size, bs)] File "/home/opc/FinRL/finrl/meta/paper_trading/common.py", line 318, in values = [self.cri(states[i : i + bs]) for i in range(0, buffer_size, bs)] File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/opc/FinRL/finrl/meta/paper_trading/common.py", line 72, in forward return self.net(state) # advantage value File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/opc/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: could not create a primitive descriptor for a matmul primitive

JohannesDupont commented 7 months ago

Does anyone have a working version for the paper trading notebook?

RaulSokolova commented 6 months ago

@svarmo , hey thanks for letting us know , can you provide some details how did you get it to work ?