RedisTimeSeries / redistimeseries-py

RedisTimeSeries python client
https://redistimeseries.io
BSD 3-Clause "New" or "Revised" License
99 stars 21 forks source link

How to resample OHLC data for stock market? #81

Closed arul67800 closed 3 years ago

arul67800 commented 3 years ago

can anybody help me out with a code snippet for OHLC data using FIRST and LAST , HIGH and LOW command??

filipecosta90 commented 3 years ago

Hi there @arul67800 , with regards to modeling the stock market data I would split each stock and metric type into distinct timeseries ( for the OHLC we only need price ). To illustrate this with an example I've added bellow a code snippet to ingest and query in two distinct ways the “FAANG” stock prices for a 1month period.

requirements

To be able to run the code example you need the following packages

yfinance==0.1.55
redistimeseries==1.4.3

expected output

To motivate you to run the example here's the expected output of the example bellow:

$ python3 issue_81.py 

##############################################################
Part 1: INSERT DATA
##############################################################

## Retrieving the sample data from yahoo finance
[*********************100%***********************]  5 of 5 completed
## Insert data into Redis TimeSeries

##############################################################
Part 2: QUERYING DATA
##############################################################

## Option 1) query OLHC data using the Aggregation series
Example of AAPL stock open price for 30 day range, aggregated by a 24 hours period on each sample:

[(1601510400000, 117.019996643066), (1601596800000, 114.14990234375), (1601856000000, 114.110000610352), (1601942400000, 115.839996337891), (1602028800000, 114.900001525879), (1602115200000, 115.610000610352), (1602201600000, 115.425003051758), (1602460800000, 120.23999786377), (1602547200000, 122.800003051758), (1602633600000, 120.610000610352), (1602720000000, 119.339897155762), (1602806400000, 121.129997253418), (1603065600000, 120.160003662109), (1603152000000, 116.360000610352), (1603238400000, 117.379997253418), (1603324800000, 117.625), (1603411200000, 115.980102539062), (1603670400000, 115.224998474121), (1603756800000, 114.730003356934), (1603843200000, 114.082397460938), (1603929600000, 113.65160369873)]

## Option 2) query OLHC data using the raw series
Example of NFLX stock low price for 30 day range, aggregated by a 24 hours period on each sample:

[(1601510400000, 504.790008544922), (1601596800000, 503.320007324219), (1601856000000, 503.848602294922), (1601942400000, 505.079986572266), (1602028800000, 521.394287109375), (1602115200000, 524.679992675781), (1602201600000, 535.594970703125), (1602460800000, 538.809997558594), (1602547200000, 540.213623046875), (1602633600000, 541.190002441406), (1602720000000, 535.52001953125), (1602806400000, 530.580017089844), (1603065600000, 526.219970703125), (1603152000000, 523.400024414062), (1603238400000, 488.489990234375), (1603324800000, 482.084991455078), (1603411200000, 482.0), (1603670400000, 479.399993896484), (1603756800000, 483.130004882812), (1603843200000, 483.739990234375), (1603929600000, 480.023803710938), (1604016000000, 472.519989013672)]

Data ingestion

For each of the FAANG stocks we will have the following series:

If you notice the LABELS portion of the create command I've added two labels ( one for the metric type and other for the stock name ). That would enable us in the future to query for a specific metric ( for example give me all raw stock prices, or computed low, etc... ) or for a specific stock ( for a specific stock give me all metrics -- which in our example are raw_5m, low_1d, high_1d, open_1d, close_1d ).

Side note on ingestion

Please notice that the 4 computed aggregation time-series (low, high, open, close) could also be computed at query time. You can think of this type of automatic aggregation as a optimization for query time. More about this subject here: https://oss.redislabs.com/redistimeseries/commands/#aggregation-compaction-downsampling

Querying data

The added code snipet provides two ways of querying the dataset:

Both should provide the same output if the aggregate ranges are equivalent.


Code example

import yfinance as yf
from redistimeseries.client import Client
import datetime as dt

##############################################################
# make a connection to Redis TimeSeries
##############################################################

rts = Client(host='127.0.0.1', port=6379)

##############################################################
# 1) INSERT DATA
##############################################################
print("\n##############################################################")
print("Part 1: INSERT DATA")
print("##############################################################\n")

##############################################################
# Used stocks - for this example we'll use FAANG stocks
#
# In finance, “FAANG” is an acronym that refers to the stocks of five prominent American technology companies:
# Facebook (FB), Amazon (AMZN), Apple (AAPL), Netflix (NFLX); and Alphabet (GOOG) (formerly known as Google).
##############################################################

stocks = "FB AMZN AAPL NFLX GOOG"

# create the aggregation
for stock_name in stocks.split(" "):
    source_timeserie_name = "stocks_{}_raw_5m".format(stock_name)
    rts.create(source_timeserie_name, labels={"stock": stock_name, "type": "raw_stock_values"})

    ##############################################################
    # create the OLHC auto aggregate time-series
    ##############################################################
    open_agg_timeserie_name = "stocks_open_{}_1d".format(stock_name)
    low_agg_timeserie_name = "stocks_low_{}_1d".format(stock_name)
    high_agg_timeserie_name = "stocks_high_{}_1d".format(stock_name)
    close_agg_timeserie_name = "stocks_close_{}_1d".format(stock_name)
    # create 1 day aggregation timeseries
    rts.create(open_agg_timeserie_name, labels={"stock": stock_name, "type": "first"})
    rts.create(low_agg_timeserie_name, labels={"stock": stock_name, "type": "min"})
    rts.create(high_agg_timeserie_name, labels={"stock": stock_name, "type": "max"})
    rts.create(close_agg_timeserie_name, labels={"stock": stock_name, "type": "last"})
    # create 1 day rules
    rts.createrule(source_timeserie_name, open_agg_timeserie_name, "FIRST", 24 * 60 * 60 * 1000)
    rts.createrule(source_timeserie_name, low_agg_timeserie_name, "MIN", 24 * 60 * 60 * 1000)
    rts.createrule(source_timeserie_name, high_agg_timeserie_name, "MAX", 24 * 60 * 60 * 1000)
    rts.createrule(source_timeserie_name, close_agg_timeserie_name, "LAST", 24 * 60 * 60 * 1000)

##############################################################
# Retrieve the sample data from yahoo finance
##############################################################
print("## Retrieving the sample data from yahoo finance")
df = yf.download(stocks, start="2020-10-01", interval="5m", end="2020-11-01")

##############################################################
# insert data into Redis TimeSeries
##############################################################
print("## Insert data into Redis TimeSeries")
for time_index, row in df.iterrows():
    # using the adjusted closing price
    for stock_name, adj_close_value in row.get('Adj Close').iteritems():
        timeserie_name = "stocks_{}_raw_5m".format(stock_name)
        rts.add(timeserie_name, int(time_index.timestamp()) * 1000, adj_close_value)

##############################################################
# 2) QUERYING DATA
##############################################################
print("\n##############################################################")
print("Part 2: QUERYING DATA")
print("##############################################################\n")
start_time_string = '2020-10-01 00:00:00'
start_time_ms = int(dt.datetime.strptime(start_time_string, '%Y-%m-%d %H:%M:%S').timestamp()) * 1000
end_time_string = '2020-11-01 00:00:00'
end_time_ms = int(dt.datetime.strptime(end_time_string, '%Y-%m-%d %H:%M:%S').timestamp()) * 1000

##############################################################
# query OLHC data using the Aggregation series
# Example of AAPL stock open price for 30 day range, aggregated by a 24 hours period on each sample
##############################################################
result_open = rts.range("stocks_open_AAPL_1d", start_time_ms, end_time_ms)
print("## Option 1) query OLHC data using the Aggregation series")
print("Example of AAPL stock open price for 30 day range, aggregated by a 24 hours period on each sample:\n")
print(result_open)
print("\n")

##############################################################
# query OLHC data using the raw series
# Example of NFLX stock low price for 30 day range, aggregated by a 24 hours period on each sample
##############################################################
result_low = rts.range("stocks_NFLX_raw_5m", start_time_ms, end_time_ms, aggregation_type="MIN",
                       bucket_size_msec=24 * 60 * 60 * 1000)
print("## Option 2) query OLHC data using the raw series")
print("Example of NFLX stock low price for 30 day range, aggregated by a 24 hours period on each sample:\n")
print(result_low)

@arul67800 can you please reply back confirming if the above example fits your requirements. Closing this issue. Please feel free to reopen if you are not able to replicate the above or you have further requirements that are not met.

arul67800 commented 3 years ago

tick={'tradable': True, 'mode': 'full', 'instrument_token': 57648135, 'last_price': 49159.0, 'last_quantity': 1, 'average_price': 49148.83, 'volume': 11058, 'buy_quantity': 1104, 'sell_quantity': 1404, 'ohlc': {'open': 49256.0, 'high': 49300.0, 'low': 48968.0, 'close': 49256.0}, 'change': -0.19693032320935522, 'last_trade_time': datetime.datetime(2020, 12, 10, 17, 43, 23), 'oi': 12597, 'oi_day_high': 12598, 'oi_day_low': 10933, 'timestamp': datetime.datetime(2020, 12, 10, 17, 43, 24), 'depth': {'buy': [{'quantity': 2, 'price': 49159.0, 'orders': 2}, {'quantity': 2, 'price': 49158.0, 'orders': 2}, {'quantity': 3, 'price': 49157.0, 'orders': 3}, {'quantity': 4, 'price': 49156.0, 'orders': 4}, {'quantity': 3, 'price': 49155.0, 'orders': 3}], 'sell': [{'quantity': 1, 'price': 49168.0, 'orders': 1}, {'quantity': 3, 'price': 49169.0, 'orders': 3}, {'quantity': 8, 'price': 49170.0, 'orders': 5}, {'quantity': 2, 'price': 49171.0, 'orders': 2}, {'quantity': 1, 'price': 49172.0, 'orders': 1}]}},

arul67800 commented 3 years ago

hi sir, Thanks for your script, actually we never use redis for historical data but for tick data, which has multiple updates for multiple stocks per second. i have given the tick structure which im handling using KDB+, i want to analyse OHLCV, and Ask and bid price and store in database, can you please optimize the code for this tick data..

arul67800 commented 3 years ago

I also wondering how you will forward fill or backward fill NaN values when there was no ticks received during the particular time bucket.

filipecosta90 commented 3 years ago

I also wondering how you will forward fill or backward fill NaN values when there was no ticks received during the particular time bucket.

Hi there @arul67800 we allow out-of-order updates and / or inserts so that should cover all your cases correct? Do you have an example of the requirements you guys are looking for? glad to help out :)

arul67800 commented 3 years ago

How to add multiple samples to same timestamp?? ie in case of raw data i need to add "last_price","volume","ask_price","bid_price, and in case of aggregation series, how to add in same timestamp "open","high","low","close","volume", under single timestamp instead of multiple keys for individual label??

arul67800 commented 3 years ago

Also i need some setup to fill the NaN values if no ticks were received...All possible in KDB+

arul67800 commented 3 years ago

There is also a pitfall where im loosing time in converting datetime index to unix timestamp and again converting unix timestamp to datetime index..why are you strictly using unix time...why cant be flexible??

arul67800 commented 3 years ago

If these were fullfilled, RedisTimeSeries will get a undeniable position in finance industry

arul67800 commented 3 years ago

You are allocating 64 bits for timestamp, why cant that be in any format?

arul67800 commented 3 years ago

In short i need

  1. Single key and timestamp, Multiple values to store ie store close_price , volume, ask and bid prices in a single timestamp
  2. multi Resample a single value from those stored multiple values and store in a multi resampled key value pair, ie single timestamp with OHLCV data
  3. Forward fill and back fill if no ticks receive
  4. Flexible timestamp ie either unix timestamp or any datetime format which may utilize the same 64 bit which the unix timestamp is using, because,your are using millisecond precision by default, but in most of the use cases the millisecond precision was useless- for example , the initial tick data utilize milli second precision , but the resampled ohlc data uses 1 min precsion so using milliseconds precision is a waste of memory and processing