AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.48k stars 2.31k forks source link

AlpacaProcessor not creating price_array due to mismatch of array dimensions #982

Open ErikBaer opened 1 year ago

ErikBaer commented 1 year ago

Hi Everyone!
Describe the bug

When running the AlpacaProcessor to clean the previously downloaded Data, somehow the data gets / is mismatched, resulting in an error when trying to create price_array, tech_array and turbulence_array from the data as in screenshot below.

I have read many issues here and tried a series of approaches, but the error still persists. Changing some times due to timezones did not seem to have any effect e.g. . I have run clean_data successfully in the past (a few months ago) . Did anyone come across the same bug ? I would be very grateful in any idea on how to resolve this.

I am not sure how exactly to further describe the Bug, however I am happy to elaborate further or give more information in response to any suggestion / question. Thank you very much for your efforts in assisting me! Would really love to get this to work again!

To Reproduce Steps to reproduce the behavior:

  1. Go to PapertradingDemo
  2. Run the Notebook
  3. Scroll down to DP.df_to_array(processed_data, if_vix=True)
  4. See error of mismatching input array dimensions

Expected behavior Have a dataset downloaded, cleaned and processes by the processor, so price_array etc. can be successfully derived from the dataset.

Screenshots If applicable, add screenshots to help explain your problem.

Bildschirmfoto 2023-04-13 um 09 19 44

Additional context Add any other context about the problem here.

ErikBaer commented 1 year ago

Update:

I went through the data processing step by step, trying to dissect what is happening. With the assistance of GPT-4 I have managed to remove the error. This is the approach that I finally came up with, not to forget the assistance by my digital mentor : add the end of add_technical_indicator, add this line:

df.fillna(method='ffill', inplace=True);

Given that it is just a few rows with missing data, this seems reasonable to me. Are there any implications I am not aware of ? If not I would sugesst adding this line to the end of add_technical_indicators, to make sure all tickers and indicators have indeed the same length before further processing.