Issues with FinRL_PaperTrading_Demo and all demo's

TomCS91 commented 1 year ago

Is it just me or are the demo's getting more and more broken? Are they totally untested?

Firstly, to make them run in colab you need to manually install wrds and swig otherwise the install will fail.

The FinRL_PaperTrading_Demo notebook also throws loads of errors when doing the conversion to numpy, saying the length is different. Doubly annoying that all the other (good) tutorials didn't even have the numpy step. Unsure why the good data source is paired with an overly complex and non working data flow.

Moving between data sources is overly complex for no reason I can see. The yahoo data downloader is pretty useless in terms of data but its also the most useable in terms of giving an easy to use featured output.

TomCS91 commented 1 year ago

Same issue in the refactored one

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2044 and the array at index 1 has size 1968

The data processor seems to fin NaN's that don't exist. During data processing it gives this for every stock: Alpaca successfully connected The price of the first row for ticker AAPL is NaN. It will filled with the first valid price. The price of the first row for ticker AMGN is NaN. It will filled with the first valid price. The price of the first row for ticker AXP is NaN. It will filled with the first valid price. The price of the first row for ticker BA is NaN. It will filled with the first valid price. The price of the first row for ticker CAT is NaN. It will filled with the first valid price. The price of the first row for ticker CRM is NaN. It will filled with the first valid price.

Eventhough theres no NaN's in the original donloaded data.

I'm not sue what array its trying to concatinate to get the error.

Full error log below:

`ValueError Traceback (most recent call last) in ----> 1 train(start_date = '2022-08-25', 2 end_date = '2022-08-31', 3 ticker_list = ticker_list, 4 data_source = 'alpaca', 5 time_interval= '1Min',

3 frames /usr/local/lib/python3.8/dist-packages/finrl/meta/paper_trading/common.py in train(start_date, end_date, ticker_list, data_source, time_interval, technical_indicator_list, drl_lib, env, model_name, if_vix, **kwargs) 719 else: 720 data = dp.add_turbulence(data) --> 721 price_array, tech_array, turbulence_array = dp.df_to_array(data, if_vix) 722 env_config = { 723 "price_array": price_array,

/usr/local/lib/python3.8/dist-packages/finrl/meta/data_processor.py in df_to_array(self, df, if_vix) 66 67 def df_to_array(self, df, if_vix) -> np.array: ---> 68 price_array, tech_array, turbulence_array = self.processor.df_to_array( 69 df, self.tech_indicator_list, if_vix 70 )

/usr/local/lib/python3.8/dist-packages/finrl/meta/data_processors/processor_alpaca.py in df_to_array(self, df, tech_indicator_list, if_vix) 278 if_first_time = False 279 else: --> 280 price_array = np.hstack( 281 [price_array, df[df.tic == tic][["close"]].values] 282 )

<__array_function__ internals> in hstack(*args, **kwargs) [/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py](https://localhost:8080/#) in hstack(tup) 343 return _nx.concatenate(arrs, 0) 344 else: --> 345 return _nx.concatenate(arrs, 1) 346 347 <__array_function__ internals> in concatenate(*args, **kwargs) ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2044 and the array at index 1 has size 1968`

zhumingpassional commented 1 year ago

we tested the notebooks. however, the python version and some lib versions (such as ray) may not be compatible.

we revised finrl codes and now it can be installed successfully in colab.

you said that FinRL_PaperTrading_Demo notebook also throws loads of errors, could you pls paste the errors here?

you said "Same issue in the refactored one", I think the price of the first day is missing. we will test it.

Luc8102 commented 1 year ago

@zhumingpassional Hello. I have posted some info in the discord server. I get the same error with the array sizes when I backdate past November 5th 2022. I attribute this to the Daylight Savings Time practice that the stock market in the US observes. When I print arrays in the function 'def clean_data' in processor_alpaca.py, I can see where it prints NaN values at the beginning of the array and then if I remember correctly, the end of the array contained times that are past 16:00. It went to 16:59, I think.

November 5th, we lost an hour. It happens on the first Sunday of November.

Marcipops has made a change to make the code work for these 6 months but it needs to account for DST.

TomCS91 commented 1 year ago

Thanks @zhumingpassional , I just ran it again.

It still errors on installing with the box2d error's

Building wheel for box2d-py (setup.py) ... error
  ERROR: Failed building wheel for box2d-py
  Running setup.py clean for box2d-py

ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ql7_nlte/box2d-py_dd972c6328bc4380a946a24bd0edf81c/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ql7_nlte/box2d-py_dd972c6328bc4380a946a24bd0edf81c/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ax9wlc9i/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/box2d-py Check the logs for full command output.

I found previously this error can be fixed by installing swig first, however even with this you will get this error when running the imports:

----> 9 import wrds
     10 from stockstats import StockDataFrame as Sdf
     11 

ModuleNotFoundError: No module named 'wrds'

I found adding a cell above the FinRL install with this fixes both those issues: %pip install swig wrds

Upon running the clean data it still shows up as NaN in each ticker, however the original dataframe has no NaN's in it at all. The alpaca clean data is incredibly slow and seems like its finding things that don't exist?

Also, why are all the data downloads hashed out?

Running the training errors the same as before, full error below:


[<ipython-input-24-9d55bb4339a9>](https://localhost:8080/#) in <module>
----> 1 train(start_date = '2022-08-25', 
      2       end_date = '2022-08-31',
      3       ticker_list = ticker_list,
      4       data_source = 'alpaca',
      5       time_interval= '1Min',

3 frames
[<ipython-input-6-3b01b8e605e8>](https://localhost:8080/#) in train(start_date, end_date, ticker_list, data_source, time_interval, technical_indicator_list, drl_lib, env, model_name, if_vix, **kwargs)
     35     else:
     36         data = dp.add_turbulence(data)
---> 37     price_array, tech_array, turbulence_array = dp.df_to_array(data, if_vix)
     38     env_config = {
     39         "price_array": price_array,

[/usr/local/lib/python3.8/dist-packages/finrl/meta/data_processor.py](https://localhost:8080/#) in df_to_array(self, df, if_vix)
     66 
     67     def df_to_array(self, df, if_vix) -> np.array:
---> 68         price_array, tech_array, turbulence_array = self.processor.df_to_array(
     69             df, self.tech_indicator_list, if_vix
     70         )

[/usr/local/lib/python3.8/dist-packages/finrl/meta/data_processors/processor_alpaca.py](https://localhost:8080/#) in df_to_array(self, df, tech_indicator_list, if_vix)
    278                 if_first_time = False
    279             else:
--> 280                 price_array = np.hstack(
    281                     [price_array, df[df.tic == tic][["close"]].values]
    282                 )

<__array_function__ internals> in hstack(*args, **kwargs)

[/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py](https://localhost:8080/#) in hstack(tup)
    343         return _nx.concatenate(arrs, 0)
    344     else:
--> 345         return _nx.concatenate(arrs, 1)
    346 
    347 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2044 and the array at index 1 has size 1968```

TomCS91 commented 1 year ago

I've been doing some testing on a few different setups and it seems the date might be at least partially to blame. I ran a train loop with the date of "start_date = '2022-12-04', end_date = '2022-12-24' " and it ran fine. However running the same loop with dates of "start_date = '2022-11-04', end_date = '2022-11-24' " and it throws the same group of errors, the price being NaN and the array dimension.

Athe-kunal commented 1 year ago

Yes, on November 5 we had day light savings time adjustment and if you see in the processor alpaca download data method, the dates are currently hard-coded to support DLST. We need a fix for that

TomCS91 commented 1 year ago

Ran it again today using the data range of 2022-10-02 - 2022-12-01 and its showing similar behaviour.

It also takes 36 minutes to run the clean data and the same "The price of the first row for ticker AAPL is NaN" error on every stock. The alpaca data processor seems to be the issue I think? Its massively slower than the others and seems to make errors that don't really exist?

Full error below.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-12-9a432b722137>](https://localhost:8080/#) in <module>
----> 1 train(start_date = '2022-10-02', 
      2       end_date = '2022-12-01',
      3       ticker_list = ticker_list,
      4       data_source = 'alpaca',
      5       time_interval= '1Min',

3 frames
[/usr/local/lib/python3.8/dist-packages/finrl/meta/paper_trading/common.py](https://localhost:8080/#) in train(start_date, end_date, ticker_list, data_source, time_interval, technical_indicator_list, drl_lib, env, model_name, if_vix, **kwargs)
    719     else:
    720         data = dp.add_turbulence(data)
--> 721     price_array, tech_array, turbulence_array = dp.df_to_array(data, if_vix)
    722     env_config = {
    723         "price_array": price_array,

[/usr/local/lib/python3.8/dist-packages/finrl/meta/data_processor.py](https://localhost:8080/#) in df_to_array(self, df, if_vix)
     66 
     67     def df_to_array(self, df, if_vix) -> np.array:
---> 68         price_array, tech_array, turbulence_array = self.processor.df_to_array(
     69             df, self.tech_indicator_list, if_vix
     70         )

[/usr/local/lib/python3.8/dist-packages/finrl/meta/data_processors/processor_alpaca.py](https://localhost:8080/#) in df_to_array(self, df, tech_indicator_list, if_vix)
    278                 if_first_time = False
    279             else:
--> 280                 price_array = np.hstack(
    281                     [price_array, df[df.tic == tic][["close"]].values]
    282                 )

<__array_function__ internals> in hstack(*args, **kwargs)

[/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py](https://localhost:8080/#) in hstack(tup)
    343         return _nx.concatenate(arrs, 0)
    344     else:
--> 345         return _nx.concatenate(arrs, 1)
    346 
    347 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 17322 and the array at index 1 has size 16863

nexon33 commented 1 year ago

As seen on this picture. Alpaca just doesn't return the same amount of data for each stock it seems. I haven't checked which timesteps where missing.

TomCS91 commented 1 year ago

Does anyone have a generic data processor?

One where you can provide a dataframe without needing the alpaca broker itself. All the other aspects like vix and technical indicators etc would be useful given the alpaca downloader seems to be having so many issue?

kruzel commented 1 year ago

I have the same problem. dates are aligned, but number of line is different for each ticker.

kruzel commented 1 year ago

I added the following code to processor_alpaca.clean_data() and it fixed the misalignment. please add it to code in git repo.

    def clean_data(self, df):
        tic_list = np.unique(df.tic.values)
        n_tickers = len(tic_list)

        # align start and end dates
        unique_times = df['timestamp'].unique()
        for time in unique_times:
            if len(df[df.timestamp==time].index) < n_tickers:
                df = df[df.timestamp!=time]

zhumingpassional commented 1 year ago

@kruzel good codes.

could you pls submit a PR?

kruzel commented 1 year ago

What do you mean by PR?

On Sat, 21 Jan 2023, 3:01 Ming Zhu @.***> wrote:

@kruzel https://github.com/kruzel good codes.

could you pls submit a PR?

— Reply to this email directly, view it on GitHub https://github.com/AI4Finance-Foundation/FinRL/issues/835#issuecomment-1399110002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHOIWYUUB26USXT5QRZNYTWTMYQDANCNFSM6AAAAAATHHOYEU . You are receiving this because you were mentioned.Message ID: @.***>

nexon33 commented 1 year ago

PR means pull request

kruzel commented 1 year ago

Got it.

On Sat, 21 Jan 2023, 14:20 nexon33 @.***> wrote:

PR means pull request

— Reply to this email directly, view it on GitHub https://github.com/AI4Finance-Foundation/FinRL/issues/835#issuecomment-1399241297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHOIW2FUEHCWS5PSETVJFTWTPICPANCNFSM6AAAAAATHHOYEU . You are receiving this because you were mentioned.Message ID: @.***>

AI4Finance-Foundation / FinRL

Issues with FinRL_PaperTrading_Demo and all demo's #835