Closed arul67800 closed 3 years ago
Hi there @arul67800 , with regards to modeling the stock market data I would split each stock and metric type into distinct timeseries ( for the OHLC we only need price ). To illustrate this with an example I've added bellow a code snippet to ingest and query in two distinct ways the “FAANG” stock prices for a 1month period.
To be able to run the code example you need the following packages
yfinance==0.1.55
redistimeseries==1.4.3
To motivate you to run the example here's the expected output of the example bellow:
$ python3 issue_81.py
##############################################################
Part 1: INSERT DATA
##############################################################
## Retrieving the sample data from yahoo finance
[*********************100%***********************] 5 of 5 completed
## Insert data into Redis TimeSeries
##############################################################
Part 2: QUERYING DATA
##############################################################
## Option 1) query OLHC data using the Aggregation series
Example of AAPL stock open price for 30 day range, aggregated by a 24 hours period on each sample:
[(1601510400000, 117.019996643066), (1601596800000, 114.14990234375), (1601856000000, 114.110000610352), (1601942400000, 115.839996337891), (1602028800000, 114.900001525879), (1602115200000, 115.610000610352), (1602201600000, 115.425003051758), (1602460800000, 120.23999786377), (1602547200000, 122.800003051758), (1602633600000, 120.610000610352), (1602720000000, 119.339897155762), (1602806400000, 121.129997253418), (1603065600000, 120.160003662109), (1603152000000, 116.360000610352), (1603238400000, 117.379997253418), (1603324800000, 117.625), (1603411200000, 115.980102539062), (1603670400000, 115.224998474121), (1603756800000, 114.730003356934), (1603843200000, 114.082397460938), (1603929600000, 113.65160369873)]
## Option 2) query OLHC data using the raw series
Example of NFLX stock low price for 30 day range, aggregated by a 24 hours period on each sample:
[(1601510400000, 504.790008544922), (1601596800000, 503.320007324219), (1601856000000, 503.848602294922), (1601942400000, 505.079986572266), (1602028800000, 521.394287109375), (1602115200000, 524.679992675781), (1602201600000, 535.594970703125), (1602460800000, 538.809997558594), (1602547200000, 540.213623046875), (1602633600000, 541.190002441406), (1602720000000, 535.52001953125), (1602806400000, 530.580017089844), (1603065600000, 526.219970703125), (1603152000000, 523.400024414062), (1603238400000, 488.489990234375), (1603324800000, 482.084991455078), (1603411200000, 482.0), (1603670400000, 479.399993896484), (1603756800000, 483.130004882812), (1603843200000, 483.739990234375), (1603929600000, 480.023803710938), (1604016000000, 472.519989013672)]
For each of the FAANG stocks we will have the following series:
stocks_<STOCKNAME>_raw_5m
- containing the raw price data ( one sample per 5min ).stocks_<STOCKNAME>_low_1d
- containing the automatically computed aggregation per stock/day the stock low price ( one sample per day).stocks_<STOCKNAME>_high_1d
- containing the automatically computed aggregation per stock/day the stock highest price ( one sample per day).stocks_<STOCKNAME>_open_1d
- containing the automatically computed aggregation per stock/day the stock open price ( one sample per day).stocks_<STOCKNAME>_close_1d
- containing the automatically computed aggregation per stock/day the stock close price ( one sample per day).If you notice the LABELS portion of the create command I've added two labels ( one for the metric type and other for the stock name ). That would enable us in the future to query for a specific metric ( for example give me all raw stock prices, or computed low, etc... ) or for a specific stock ( for a specific stock give me all metrics -- which in our example are raw_5m, low_1d, high_1d, open_1d, close_1d ).
Please notice that the 4 computed aggregation time-series (low, high, open, close) could also be computed at query time. You can think of this type of automatic aggregation as a optimization for query time. More about this subject here: https://oss.redislabs.com/redistimeseries/commands/#aggregation-compaction-downsampling
The added code snipet provides two ways of querying the dataset:
Both should provide the same output if the aggregate ranges are equivalent.
import yfinance as yf
from redistimeseries.client import Client
import datetime as dt
##############################################################
# make a connection to Redis TimeSeries
##############################################################
rts = Client(host='127.0.0.1', port=6379)
##############################################################
# 1) INSERT DATA
##############################################################
print("\n##############################################################")
print("Part 1: INSERT DATA")
print("##############################################################\n")
##############################################################
# Used stocks - for this example we'll use FAANG stocks
#
# In finance, “FAANG” is an acronym that refers to the stocks of five prominent American technology companies:
# Facebook (FB), Amazon (AMZN), Apple (AAPL), Netflix (NFLX); and Alphabet (GOOG) (formerly known as Google).
##############################################################
stocks = "FB AMZN AAPL NFLX GOOG"
# create the aggregation
for stock_name in stocks.split(" "):
source_timeserie_name = "stocks_{}_raw_5m".format(stock_name)
rts.create(source_timeserie_name, labels={"stock": stock_name, "type": "raw_stock_values"})
##############################################################
# create the OLHC auto aggregate time-series
##############################################################
open_agg_timeserie_name = "stocks_open_{}_1d".format(stock_name)
low_agg_timeserie_name = "stocks_low_{}_1d".format(stock_name)
high_agg_timeserie_name = "stocks_high_{}_1d".format(stock_name)
close_agg_timeserie_name = "stocks_close_{}_1d".format(stock_name)
# create 1 day aggregation timeseries
rts.create(open_agg_timeserie_name, labels={"stock": stock_name, "type": "first"})
rts.create(low_agg_timeserie_name, labels={"stock": stock_name, "type": "min"})
rts.create(high_agg_timeserie_name, labels={"stock": stock_name, "type": "max"})
rts.create(close_agg_timeserie_name, labels={"stock": stock_name, "type": "last"})
# create 1 day rules
rts.createrule(source_timeserie_name, open_agg_timeserie_name, "FIRST", 24 * 60 * 60 * 1000)
rts.createrule(source_timeserie_name, low_agg_timeserie_name, "MIN", 24 * 60 * 60 * 1000)
rts.createrule(source_timeserie_name, high_agg_timeserie_name, "MAX", 24 * 60 * 60 * 1000)
rts.createrule(source_timeserie_name, close_agg_timeserie_name, "LAST", 24 * 60 * 60 * 1000)
##############################################################
# Retrieve the sample data from yahoo finance
##############################################################
print("## Retrieving the sample data from yahoo finance")
df = yf.download(stocks, start="2020-10-01", interval="5m", end="2020-11-01")
##############################################################
# insert data into Redis TimeSeries
##############################################################
print("## Insert data into Redis TimeSeries")
for time_index, row in df.iterrows():
# using the adjusted closing price
for stock_name, adj_close_value in row.get('Adj Close').iteritems():
timeserie_name = "stocks_{}_raw_5m".format(stock_name)
rts.add(timeserie_name, int(time_index.timestamp()) * 1000, adj_close_value)
##############################################################
# 2) QUERYING DATA
##############################################################
print("\n##############################################################")
print("Part 2: QUERYING DATA")
print("##############################################################\n")
start_time_string = '2020-10-01 00:00:00'
start_time_ms = int(dt.datetime.strptime(start_time_string, '%Y-%m-%d %H:%M:%S').timestamp()) * 1000
end_time_string = '2020-11-01 00:00:00'
end_time_ms = int(dt.datetime.strptime(end_time_string, '%Y-%m-%d %H:%M:%S').timestamp()) * 1000
##############################################################
# query OLHC data using the Aggregation series
# Example of AAPL stock open price for 30 day range, aggregated by a 24 hours period on each sample
##############################################################
result_open = rts.range("stocks_open_AAPL_1d", start_time_ms, end_time_ms)
print("## Option 1) query OLHC data using the Aggregation series")
print("Example of AAPL stock open price for 30 day range, aggregated by a 24 hours period on each sample:\n")
print(result_open)
print("\n")
##############################################################
# query OLHC data using the raw series
# Example of NFLX stock low price for 30 day range, aggregated by a 24 hours period on each sample
##############################################################
result_low = rts.range("stocks_NFLX_raw_5m", start_time_ms, end_time_ms, aggregation_type="MIN",
bucket_size_msec=24 * 60 * 60 * 1000)
print("## Option 2) query OLHC data using the raw series")
print("Example of NFLX stock low price for 30 day range, aggregated by a 24 hours period on each sample:\n")
print(result_low)
@arul67800 can you please reply back confirming if the above example fits your requirements. Closing this issue. Please feel free to reopen if you are not able to replicate the above or you have further requirements that are not met.
tick={'tradable': True, 'mode': 'full', 'instrument_token': 57648135, 'last_price': 49159.0, 'last_quantity': 1, 'average_price': 49148.83, 'volume': 11058, 'buy_quantity': 1104, 'sell_quantity': 1404, 'ohlc': {'open': 49256.0, 'high': 49300.0, 'low': 48968.0, 'close': 49256.0}, 'change': -0.19693032320935522, 'last_trade_time': datetime.datetime(2020, 12, 10, 17, 43, 23), 'oi': 12597, 'oi_day_high': 12598, 'oi_day_low': 10933, 'timestamp': datetime.datetime(2020, 12, 10, 17, 43, 24), 'depth': {'buy': [{'quantity': 2, 'price': 49159.0, 'orders': 2}, {'quantity': 2, 'price': 49158.0, 'orders': 2}, {'quantity': 3, 'price': 49157.0, 'orders': 3}, {'quantity': 4, 'price': 49156.0, 'orders': 4}, {'quantity': 3, 'price': 49155.0, 'orders': 3}], 'sell': [{'quantity': 1, 'price': 49168.0, 'orders': 1}, {'quantity': 3, 'price': 49169.0, 'orders': 3}, {'quantity': 8, 'price': 49170.0, 'orders': 5}, {'quantity': 2, 'price': 49171.0, 'orders': 2}, {'quantity': 1, 'price': 49172.0, 'orders': 1}]}},
hi sir, Thanks for your script, actually we never use redis for historical data but for tick data, which has multiple updates for multiple stocks per second. i have given the tick structure which im handling using KDB+, i want to analyse OHLCV, and Ask and bid price and store in database, can you please optimize the code for this tick data..
I also wondering how you will forward fill or backward fill NaN values when there was no ticks received during the particular time bucket.
I also wondering how you will forward fill or backward fill NaN values when there was no ticks received during the particular time bucket.
Hi there @arul67800 we allow out-of-order updates and / or inserts so that should cover all your cases correct? Do you have an example of the requirements you guys are looking for? glad to help out :)
How to add multiple samples to same timestamp?? ie in case of raw data i need to add "last_price","volume","ask_price","bid_price, and in case of aggregation series, how to add in same timestamp "open","high","low","close","volume", under single timestamp instead of multiple keys for individual label??
Also i need some setup to fill the NaN values if no ticks were received...All possible in KDB+
There is also a pitfall where im loosing time in converting datetime index to unix timestamp and again converting unix timestamp to datetime index..why are you strictly using unix time...why cant be flexible??
If these were fullfilled, RedisTimeSeries will get a undeniable position in finance industry
You are allocating 64 bits for timestamp, why cant that be in any format?
In short i need
can anybody help me out with a code snippet for OHLC data using FIRST and LAST , HIGH and LOW command??