AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
9.48k stars 2.31k forks source link

Open Time Bug #981

Open AlexandreColauto opened 1 year ago

AlexandreColauto commented 1 year ago

I was trying to run the notebooks (after downgrading the Pandas to 1.5.3, since the new release to 2.0.0 removed the pd.TimeDelta().delta, which was prevented from downloading the files.) And I found that all the first rows were NaN. This was messing with the data and the indicators. Then reading through the data I realized that my data is starting at 10:30 AM and the code started reading the rows at 9:30. So I hardcoded the 10:30 in my local env. I don't know if that is related to summer time that we just entered or which other cause.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Tutorial'
  2. Downgrade pandas to 1.5.3 ( !pip install pandas==1.5.3)
  3. Run all the cells until Part 2 Train Agent.
  4. See the error

Expected behavior Should Train with proper data.

Screenshots With the time set to 09:30 image

With the time set to 10:30 image

Image of the data: image

AlexandreColauto commented 1 year ago

PS. I managed to work on my environment, this is to inform the developers and who might come across the same error.

YangletLiu commented 1 year ago

@AlexandreColauto Thanks for your sharing! Appreciate it.

Seraphaious commented 1 year ago

@AlexandreColauto

Did setting the day + " 10:30:00" fix it entirely for you or does it then omit an hour of data ? I've been having similar issues and thought it was related to summertime

lcavalie commented 1 year ago

Actually this is not fixed. It's just a temporary patch that will work as long as we are in Daylight saving time. There is an error of logic in /finrl/meta/data_processors/processor_alpaca.py Line 40 the data is requested from Alpaca between 9:30 and 15:59 NY time zone The response from Alpaca comes back with timestamps in UTC. On line 52 the data is filtered between 14:30 and 20:59 UTC. Then on line 61 it is converted to NY time but in the process an hour of data is lost which explains the empty lines.

The problem is - during daylight saving time (like now) the difference between UTC and NY is 4 hours but otherwise it's 5 hours. To solve the problem you need to request the data in NY time zone, receive it in UTC (no choice there), convert it from UTC to NY time, and only after that apply the filtering between 9:30 and 15:59 and no data is lost.

I will send a PR with a fix.

lcavalie commented 1 year ago

After PR #1007 this issue can be closed.