alpacahq / Alpaca-API

The Alpaca API is a developer interface for trading operations and market data reception through the Alpaca platform.
https://alpaca.markets/
142 stars 13 forks source link

Lots of NaNs for some stocks in data returned from Alpaca #62

Closed apollokit closed 5 years ago

apollokit commented 5 years ago

I'm wondering why it is that for some periods there seem to be a lot of missing points in the data returned from the Alpaca API. Below is an example of the query I'm using in the Python API, and part of the returned data frame.

Note that I'm only using a paper trading account for the time being, so per this doc page, it should be only IEX data.

Might this be because the price wasn't changing much and so the NaN's are a placeholder for "no change" ? In that case, seems weird that sometimes there aren't NaN's when the price stays the same...

Using the python API, alpaca-trade-api==0.26

Code snippet alpaca_api = tradeapi.REST( key_id=config['key_id'], secret_key=config['secret_key'], base_url=config['base_url'] ) raw_data = alpaca_api.get_barset( symbols=alpaca_symbols, timeframe='minute', start='2019-04-12T15:30:00-04:00', end='2019-04-12T15:59:00-04:00', limit=1000).df

Part of the output ipdb> raw_data['AVP'] time open high low close
2019-04-12 15:30:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:31:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:32:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:33:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:34:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:35:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:36:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:37:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:38:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:39:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:40:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:41:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:42:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:43:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:44:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:45:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:46:00-04:00 2.870 2.870 2.860 2.865 996.0 2019-04-12 15:47:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:48:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:49:00-04:00 2.870 2.875 2.870 2.875 2941.0 2019-04-12 15:50:00-04:00 2.875 2.875 2.875 2.875 200.0 2019-04-12 15:51:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:52:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:53:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:54:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:55:00-04:00 NaN NaN NaN NaN NaN 2019-04-12 15:56:00-04:00 2.875 2.875 2.870 2.875 19479.0 2019-04-12 15:57:00-04:00 2.870 2.875 2.870 2.870 6400.0 2019-04-12 15:58:00-04:00 2.870 2.875 2.870 2.875 500.0 2019-04-12 15:59:00-04:00 2.870 2.870 2.860 2.865 1197.0

garyha commented 5 years ago

Could be used for corresponding polygon if adjusted, as prices and timestamps don't make sense here at 12 digits, a point in distant history. https://api.polygon.io/v1/historic/agg/minute/AVP?offset=1555097400000&limit=29&apiKey=XXX...

{"symbol":"AVP","aggType":"min","map": {"o":"open","c":"close","h":"high","l":"low","v":"volume","t":"timestamp","d":"timestamp"}, "ticks":[ {"o":0,"c":0,"h":0,"l":0,"v":0,"t":0,"d":0}, {"o":15.219,"c":15.219,"h":15.219,"l":15.219,"v":7600,"t":884111100000,"d":884111100000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":400,"t":884111460000,"d":884111460000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":7600,"t":884111520000,"d":884111520000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":5600,"t":884111580000,"d":884111580000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":8400,"t":884111760000,"d":884111760000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":2000,"t":884111820000,"d":884111820000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":1016400,"t":884112060000,"d":884112060000}, {"o":15.219,"c":15.219,"h":15.219,"l":15.219,"v":400,"t":884112120000,"d":884112120000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":1600,"t":884112240000,"d":884112240000}, {"o":15.219,"c":15.219,"h":15.219,"l":15.219,"v":400,"t":884112300000,"d":884112300000}, {"o":15.219,"c":15.219,"h":15.219,"l":15.219,"v":16800,"t":884112660000,"d":884112660000}, {"o":15.219,"c":15.219,"h":15.219,"l":15.219,"v":1200,"t":884112780000,"d":884112780000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":400,"t":884112900000,"d":884112900000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":6800,"t":884113260000,"d":884113260000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":2400,"t":884113440000,"d":884113440000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":40000,"t":884113620000,"d":884113620000}, {"o":15.172,"c":15.172,"h":15.172,"l":15.172,"v":800,"t":884113920000,"d":884113920000}, {"o":15.203,"c":15.203,"h":15.203,"l":15.203,"v":800,"t":884113980000,"d":884113980000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":50000,"t":884114100000,"d":884114100000}, {"o":15.188,"c":15.203,"h":15.203,"l":15.188,"v":3200,"t":884114340000,"d":884114340000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":400,"t":884114580000,"d":884114580000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":6400,"t":884114880000,"d":884114880000}, {"o":15.203,"c":15.188,"h":15.203,"l":15.188,"v":3200,"t":884115000000,"d":884115000000}, {"o":15.172,"c":15.188,"h":15.188,"l":15.172,"v":1600,"t":884115180000,"d":884115180000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":42000,"t":884115240000,"d":884115240000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":800,"t":884115300000,"d":884115300000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":9200,"t":884115480000,"d":884115480000}, {"o":15.188,"c":15.188,"h":15.188,"l":15.188,"v":5200,"t":884115660000,"d":884115660000} ] }

URL input offset=1555097400000 is epoch milliseconds using 2019-04-12T15:30:00-04:00 at https://www.infobyip.com/epochtimeconverter.php To verify, reverse operation input using 1555097400000 at https://currentmillis.com/ is Fri Apr 12 2019 19:30:00 UTC URL input limit=29 for 29 records.

ttt733 commented 5 years ago

I believe that IEX returns NaNs when there is no trade for a given timeframe. The reason you sometimes see bars when the price hasn't changed is likely that it was traded at the exact same price again after a delay. This is something that you should expect to happen for relatively low-volume symbols, and in python, you should likely be using .dropna() on the dataframe to account for it.

garyha commented 5 years ago

Compare with an alternative source on that symbol (AVP) & date to establish a level of certainty.

apollokit commented 5 years ago

Thank you, that's very helpful!