Lumiwealth / lumibot

Backtesting and Trading Bots Made Easy for Crypto, Stocks, Options, Futures, FOREX and more
GNU General Public License v3.0
927 stars 177 forks source link

Alpaca get_historical_prices() Does Not Return n Number Of Rows In Dataframe #268

Open TrainedPro opened 1 year ago

TrainedPro commented 1 year ago

Description

Using the get_historical_prices() function with Alpaca does not return length number of time-steps. This is inconsistent with other data-sources such as Yahoo Finance which would return length number of time-steps. This causes there to be a discrepancy between back-testing and live trading. This could also be why issue #258 is caused.

The part of the repository that seems to cause this issue is because of: https://github.com/Lumiwealth/lumibot/blob/507c06baa41d74030f6da7237f9bd07910755b11/lumibot/data_sources/alpaca_data.py#L163-L166 This does not update the curr_start.

This is in contrast to what Yahoo Finance does which is getting all the data and then getting the number of rows required, so the data-frame always has length number of rows: https://github.com/Lumiwealth/lumibot/blob/507c06baa41d74030f6da7237f9bd07910755b11/lumibot/tools/yahoo_helper.py#L174-L176 https://github.com/Lumiwealth/lumibot/blob/507c06baa41d74030f6da7237f9bd07910755b11/lumibot/data_sources/yahoo_data.py#L92-L95

Outputs

Alpaca

2023-01-20 05:00:00+00:00    395.88
2023-01-23 05:00:00+00:00    400.63
2023-01-24 05:00:00+00:00    400.20
2023-01-25 05:00:00+00:00    400.35
2023-01-26 05:00:00+00:00    404.75
                              ...  
2023-08-01 04:00:00+00:00    456.48
2023-08-02 04:00:00+00:00    450.13
2023-08-03 04:00:00+00:00    448.84
2023-08-04 04:00:00+00:00    446.81
2023-08-07 04:00:00+00:00    449.70
Name: close, Length: 137, dtype: float64

Yahoo Finance

2022-09-19 00:00:00-04:00    388.549988
2022-09-20 00:00:00-04:00    384.089996
2022-09-21 00:00:00-04:00    377.390015
2022-09-22 00:00:00-04:00    374.220001
2022-09-23 00:00:00-04:00    367.950012
                                ...    
2023-06-29 00:00:00-04:00    438.109985
2023-06-30 00:00:00-04:00    443.279999
2023-07-03 00:00:00-04:00    443.790009
2023-07-05 00:00:00-04:00    443.130005
2023-07-06 00:00:00-04:00    439.660004
Name: close, Length: 200, dtype: float64

Dates are slightly different but the point is the same as the length is vastly different between them both. This could cause major issues between back-testing and live trading.

Expected Behavior

Back-testing and live trading data should be similar, whether it is based on getting 200 time-steps of data (not necessarily 200 bars) or 200 time rows of data (may be more than 200). The documentation is also very ambiguous rather this issue where it states here that bars may not be equal to the length but that is opposed by what is stated here.

brettelliot commented 3 weeks ago

Can you try again? Or can you submit a code example that produces the problem (in addition to the output)?

I recently wrote a bunch of tests (test_bars.py) that shows this is working for daily bars (for alpaca, polygon, pandas, and now tradier). When length is greater than zero, timestep="day" and timeshift =None, all return N bars where N == length.