Open nv-pipo opened 1 year ago
By the way, I am seeing two timestamps in the raw quote, both as msgpack.ext.Timestamp
objects, with keys t
and r
. Only t
is documented whereas r
isn't.
Examples:
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'Q', 'ap': 313.74, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=506885515), 'r': Timestamp(seconds=1682466638, nanoseconds=507788703)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.74, 'as': 18, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507231768), 'r': Timestamp(seconds=1682466638, nanoseconds=508096712)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.73, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507273406), 'r': Timestamp(seconds=1682466638, nanoseconds=508120891)}
I see that t < r
.
@nv-pipo Just so I understand, what's currently the harm in leaving it as a msgpack.ext.Timestamp
object? Timestamps are of course essential for trading. Do you have a much faster way of parsing the timestamp?
Here are the available methods on the timestamp object:
dir(msgpack.ext.Timestamp)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'from_bytes', 'from_datetime', 'from_unix', 'from_unix_nano', 'nanoseconds', 'seconds', 'to_bytes', 'to_datetime', 'to_unix', 'to_unix_nano']
@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".
@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".
I believe it's natively coming in through the socket with msgpack encoding. As such, msgpack seems to be its raw format. As such, the expectation of a string is probably invalid.
@impredicative Well, it seems that msgpack supports strings: https://en.wikipedia.org/wiki/MessagePack. I'm no expert thought.
Note that the RAW flag for the timestamp field is also ignored in the historical API calls (let me know if I should open a separate issue):
>>> from alpaca.data.timeframe import TimeFrame
>>> from alpaca.data.requests import StockBarsRequest
>>> from alpaca.data.historical import StockHistoricalDataClient
>>>
>>> # keys required
>>> stock_client = StockHistoricalDataClient(key, secret)
>>> request_params = StockBarsRequest(
... symbol_or_symbols=["AAPL"],
... timeframe=TimeFrame.Day,
... start="2022-10-04 00:00:00",
... end="2022-10-05 00:00:00",
... raw_format=True,
... )
>>>
>>> bars = stock_client.get_stock_bars(request_params)
>>> print(bars)
data={'AAPL': [{ 'close': 146.1,
'high': 146.22,
'low': 144.26,
'open': 145.03,
'symbol': 'AAPL',
'timestamp': datetime.datetime(2022, 10, 4, 4, 0, tzinfo=datetime.timezone.utc),
'trade_count': 687505.0,
'volume': 87880164.0,
'vwap': 145.490979}]}
>>>
msgpack is ostensibly specified as the desired encoding by the client, and so in my understanding it makes sense for the response to use a msgpack object where it helps. If you don't want it, try using your own custom client without requesting the msgpack encoding, although the results may then be inefficiently delivered. I am not convinced that this issue should stay open.
Yeah, I had come to the same conclusion some time ago: I've been using my own client for months.
However, I believe it is important to fulfill the flag expectations! and this alpaca-py's stream and historical endpoints do not truly respect the RAW flag. This should be fixed if you want people to use the SDK, instead of driving them away.
Well, it is important for the user to have both the integer seconds value and the integer nanoseconds value. If a msgpack object was not used, the two values would then have to be represented in a list, but they would then be unclear to a reader as to what the list means.
An RFC-3339 timestamp, regardless of its precision, is less useful here because it is a slow and complicated object to work with when speed matters. That's even more true when it's a string that requires parsing. Integers are simplest and fastest to work with.
@impredicative I am not arguing that a string representation of a date is better than a "date data type"!
I am simply saying that functions should behave as expected, given the parameters passed: if the parameter RAW is passed, then the RAW response has to be returned. This is not the case today, for neither Websockets nor historical API calls, and thus needs to be fixed. (RAW is described/defined by the RESTful/Websocket endpoints.)
Alternatively, if these functions cannot return the expected value, the RAW parameter should be dropped from functions calls...or rename it to match the return value, eg: "RAW_EXCEPT_FOR_TIMESTAMP".
I see your point. Perhaps the arg name raw_data
can be changed to native
.
That's fair.
Should I open a separate ticket for the "Historical Stock Pricing Data" endpoint?
@nv-pipo I would just add it to the original comment of this issue itself, also including code to replicate it.
Issue
Initializing StockDataStream with parameter
raw_data=True
creates a websocket that returns timestamp as 'Timestamp', instead of the RAW string with the datetime.Replicate
output:
Expected output (from https://alpaca.markets/docs/api-references/market-data-api/stock-pricing-data/realtime/):