alpacahq / alpaca-py

The Official Python SDK for Alpaca API
https://alpaca.markets/sdks/python/getting_started.html
Apache License 2.0
613 stars 154 forks source link

Ignored "raw" flag for StockDataStream #247

Open nv-pipo opened 1 year ago

nv-pipo commented 1 year ago

Issue

Initializing StockDataStream with parameter raw_data=True creates a websocket that returns timestamp as 'Timestamp', instead of the RAW string with the datetime.

Replicate

from alpaca.data.live.stock import StockDataStream

wss_client = StockDataStream('api-key', 'secret-key', raw_data=True)

# async handler
async def quote_data_handler(data: Any):
    # quote data will arrive here
    print(data)
    print("===")

wss_client.subscribe_quotes(quote_data_handler, "AAPL")

wss_client.run()

output:

Screenshot 2023-02-08 at 1 30 14 PM

Expected output (from https://alpaca.markets/docs/api-references/market-data-api/stock-pricing-data/realtime/):

Screenshot 2023-02-08 at 1 29 17 PM
impredicative commented 1 year ago

By the way, I am seeing two timestamps in the raw quote, both as msgpack.ext.Timestamp objects, with keys t and r. Only t is documented whereas r isn't.

Examples:

<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'Q', 'ap': 313.74, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=506885515), 'r': Timestamp(seconds=1682466638, nanoseconds=507788703)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.74, 'as': 18, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507231768), 'r': Timestamp(seconds=1682466638, nanoseconds=508096712)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.73, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507273406), 'r': Timestamp(seconds=1682466638, nanoseconds=508120891)}

I see that t < r.

@nv-pipo Just so I understand, what's currently the harm in leaving it as a msgpack.ext.Timestamp object? Timestamps are of course essential for trading. Do you have a much faster way of parsing the timestamp?

Here are the available methods on the timestamp object:

dir(msgpack.ext.Timestamp)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'from_bytes', 'from_datetime', 'from_unix', 'from_unix_nano', 'nanoseconds', 'seconds', 'to_bytes', 'to_datetime', 'to_unix', 'to_unix_nano']
nv-pipo commented 1 year ago

@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".

impredicative commented 1 year ago

@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".

I believe it's natively coming in through the socket with msgpack encoding. As such, msgpack seems to be its raw format. As such, the expectation of a string is probably invalid.

nv-pipo commented 1 year ago

@impredicative Well, it seems that msgpack supports strings: https://en.wikipedia.org/wiki/MessagePack. I'm no expert thought.

Note that the RAW flag for the timestamp field is also ignored in the historical API calls (let me know if I should open a separate issue):

>>> from alpaca.data.timeframe import TimeFrame
>>> from alpaca.data.requests import StockBarsRequest
>>> from alpaca.data.historical import StockHistoricalDataClient
>>>
>>> # keys required
>>> stock_client = StockHistoricalDataClient(key, secret)
>>> request_params = StockBarsRequest(
...                         symbol_or_symbols=["AAPL"],
...                         timeframe=TimeFrame.Day,
...                         start="2022-10-04 00:00:00",
...                         end="2022-10-05 00:00:00",
...                         raw_format=True,
...                  )
>>>
>>> bars = stock_client.get_stock_bars(request_params)
>>> print(bars)
data={'AAPL': [{   'close': 146.1,
    'high': 146.22,
    'low': 144.26,
    'open': 145.03,
    'symbol': 'AAPL',
    'timestamp': datetime.datetime(2022, 10, 4, 4, 0, tzinfo=datetime.timezone.utc),
    'trade_count': 687505.0,
    'volume': 87880164.0,
    'vwap': 145.490979}]}
>>>
impredicative commented 1 year ago

msgpack is ostensibly specified as the desired encoding by the client, and so in my understanding it makes sense for the response to use a msgpack object where it helps. If you don't want it, try using your own custom client without requesting the msgpack encoding, although the results may then be inefficiently delivered. I am not convinced that this issue should stay open.

nv-pipo commented 1 year ago

Yeah, I had come to the same conclusion some time ago: I've been using my own client for months.

However, I believe it is important to fulfill the flag expectations! and this alpaca-py's stream and historical endpoints do not truly respect the RAW flag. This should be fixed if you want people to use the SDK, instead of driving them away.

impredicative commented 1 year ago

Well, it is important for the user to have both the integer seconds value and the integer nanoseconds value. If a msgpack object was not used, the two values would then have to be represented in a list, but they would then be unclear to a reader as to what the list means.

An RFC-3339 timestamp, regardless of its precision, is less useful here because it is a slow and complicated object to work with when speed matters. That's even more true when it's a string that requires parsing. Integers are simplest and fastest to work with.

nv-pipo commented 1 year ago

@impredicative I am not arguing that a string representation of a date is better than a "date data type"!

I am simply saying that functions should behave as expected, given the parameters passed: if the parameter RAW is passed, then the RAW response has to be returned. This is not the case today, for neither Websockets nor historical API calls, and thus needs to be fixed. (RAW is described/defined by the RESTful/Websocket endpoints.)

Alternatively, if these functions cannot return the expected value, the RAW parameter should be dropped from functions calls...or rename it to match the return value, eg: "RAW_EXCEPT_FOR_TIMESTAMP".

impredicative commented 1 year ago

I see your point. Perhaps the arg name raw_data can be changed to native.

nv-pipo commented 1 year ago

That's fair.

Should I open a separate ticket for the "Historical Stock Pricing Data" endpoint?

impredicative commented 1 year ago

@nv-pipo I would just add it to the original comment of this issue itself, also including code to replicate it.