alpacahq / alpaca-py

The Official Python SDK for Alpaca API
https://alpaca.markets/sdks/python/getting_started.html
Apache License 2.0
602 stars 147 forks source link

Retrieve full range historical data #503

Closed Chacoon3 closed 2 months ago

Chacoon3 commented 2 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Currently the Python data client appears not to support retrieving all the data in the given time period. For example, if I query minute bars over a one year period, there will be only 10,000 data returned from the endpoint. This is ok since you've shown in the doc that the limit per request is 10,000. However, in this Python Client, I did not see means to get the next page token. Therefore, I cant retrieve the rest of the data.

Describe the solution you'd like.

Either of the two:

  1. The BarSet object should have a field to store the next page token.

  2. The client itself implements logic to retrieve the remaining data using the next page token.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

hiohiohio commented 2 months ago

@Chacoon3 thank you for the feedback. You are right. Currently we have hardcoded limitation per request basis but no way to fetch further with considering continually. Seems we may need to change design/interface a bit which might also cause breaking change. Please let me consider this.

https://github.com/alpacahq/alpaca-py/blob/b5e13cd7f5c65cbf05e82c5b65c714b4d53bd840/alpaca/common/constants.py#L3

Chacoon3 commented 2 months ago

@Chacoon3 thank you for the feedback. You are right. Currently we have hardcoded limitation per request basis but no way to fetch further with considering continually. Seems we may need to change design/interface a bit which might also cause breaking change. Please let me consider this.

https://github.com/alpacahq/alpaca-py/blob/b5e13cd7f5c65cbf05e82c5b65c714b4d53bd840/alpaca/common/constants.py#L3

@hiohiohio Thank you for your reply.

The REST endpoint accepts the limit parameter to be at most 10000, Changing the DATA_V2_MAX_LIMIT therefore does not solve my problem as the limit is fixed at server side.

To introduce minimal change to the current interfacing design, exposing the next page token on BarSet object could be a choice to consider, and users can use the exposed field to fetch the remaining data by themselves. Just a personal suggestion.

Thank you,

gnvk commented 2 months ago

Your confusion is caused by the fact that the parameter called limit means different things in the Python SDK and in the API. In the SDK it means the total number of data points to get. If you query bars and set the limit to 10 000 then you will get 10 000 bars and yes, you won't be able to retrieve the rest of the data without the page token. In the API, on the other hand, limit actually means the page size. You can't set the page size to greater than 10 000, but you can set the total limit in the SDK to any number. If you set it to 100 000, you will get (at most) 100 000 bars, possibly from 10 pages. If you set it to None, then you won't limit the number of data points, so effectively you'll get the full range of historical data without ever seeing the page token.

Chacoon3 commented 2 months ago

Your confusion is caused by the fact that the parameter called limit means different things in the Python SDK and in the API. In the SDK it means the total number of data points to get. If you query bars and set the limit to 10 000 then you will get 10 000 bars and yes, you won't be able to retrieve the rest of the data without the page token. In the API, on the other hand, limit actually means the page size. You can't set the page size to greater than 10 000, but you can set the total limit in the SDK to any number. If you set it to 100 000, you will get (at most) 100 000 bars, possibly from 10 pages. If you set it to None, then you won't limit the number of data points, so effectively you'll get the full range of historical data without ever seeing the page token.

I see. Thanks for clarifying!