bmoscon / cryptostore

A scalable storage service for cryptocurrency data
Other
386 stars 138 forks source link

Aggregate OHLC values from trades (klines\candles) #116

Closed cryptorevizor closed 4 years ago

cryptorevizor commented 4 years ago

Hi there,

I have a question about how we can aggregete ohlcv faster that binance.

When we are working with binance, and we have 2 solutions for accepting ohlcv directrly from binance:

  1. Fetch klines by api request. With this method, if I have, for example, 20 bots on the same vps, I have to divide 1200 requests\minute (rate limit from binance) / 20 bots / 60 seconds = 1 request per second. And if we adding 100-300 msec for delay after request and wait on binance answer, we will have in most of cases new candle only after 2 seconds. First problem - getting a candle with 2 seconds lag.

Also, the second problem in: if we trade spot market and, for example, there is a pair with poor liquidity, we will have next problem: Binance recalculates and send new kline when the trade from next time interval will be executed. So, in this case we will have a delay - 2 second + unknown time (until first trade on new bar will be executed).

  1. Connect to websocket, and getting a candle with 250ms lag (binance lag for ohlcv). In my opinion the second approach is more common and effective. It can solve our first problem with 2 second lag, but it doesn't have any opportunities for solving second problem (a pair with poor liquidity).

So, when I was thinking about this, I came to the next conclusion: I should aggregare ohlcv from trades. It will give me 20ms lag, if my server located in AWS Tokyo. And, if I resample to 1 minutes bar, it will be really very quickly: on binance futures, we have 400-800 trades per minute for BTC-USDT pair.

And finally my question is:

  1. What is the best way to aggregate ohlcv values (max(), min() sum()) "on the fly" with minimal lag?

I have done a little research and got this concept:

  1. Use RedisGear https://forum.redislabs.com/t/aggregating-real-time-tick-data-into-ohlcv/350/4 OR
  2. Use RedisTimeSeries https://oss.redislabs.com/redistimeseries/commands/ OR
  3. Creating of python script, getting trades from Redis and detection of such values like trade_timestamp, last_timestamp, now() (slowly, and we don't have opportunities for scaling, if we want to run 999 bots on the same asset)

Could you please give me any advice, what should I do and what should I know about aggregation ohlcv from trades and record this aggregate value in Redis\Influx ... etc. Maybe, this is possible by standart functionality of cryptostore\cryptofeed, but i didn't see this.

yohplala commented 4 years ago

Hi, I am sorry, I am quite a newbie on this topic, but I dare to ask some information.

You say: Binance recalculates and send new kline when the trade from next time interval will be executed. So, in this case we will have a delay - 2 second + unknown time (until first trade on new bar will be executed) Which timeframe are you working with? 1mn? Do I understand correctly the trouble of liquidity you mention:

In cryptostore, I have seen nothing to compute ohlcv, but yes, it is a matter of aggregating max, min, sum. BUT, I think if you succeed to do it directly in Redis, then yes, this is very likely the fastest way to go.

Bests,

yohplala commented 4 years ago

PS: out of curiosity, do you use a specific library to manage your trades? I am curious to know. On my side, I only know CCXT, but curious to know if other open source ones exist.

bmoscon commented 4 years ago

@cryptorevizor you probably want to use cryptofeed for that, this project is more for storing all the data (for backesting or whatever you want to do with it later), as such, real time things like that are not really supported. There is a way to get the data in realtime via ZMQ, but its just the raw data, and I wouldnt set this all up just for that, you can get the data via ZMQ and a ton of other ways in near real time with just crypotfeed. Cryptofeed supports redis streams and timeseries redis data, you can also define your own aggregator to do your own OHLCV calculations with whatever metrics you want: https://github.com/bmoscon/cryptofeed/blob/master/cryptofeed/backends/aggregate.py#L36

cryptorevizor commented 4 years ago

Hi, I am sorry, I am quite a newbie on this topic, but I dare to ask some information.

You say: Binance recalculates and send new kline when the trade from next time interval will be executed. So, in this case we will have a delay - 2 second + unknown time (until first trade on new bar will be executed) Which timeframe are you working with? 1mn? Do I understand correctly the trouble of liquidity you mention:

  • at t0+1s (beginning of a candle): 1 trade, that you don't know about because Binance will wait for the next trade before sending the candle?
  • at t0+119s (end of the next candle) a 2nd trade is finally made, so you receive the previous candle? I am surprised by this, but I have not tested, so I can't say. I would think that Binance send candles when they are done, whatever the trades being done, no?

In cryptostore, I have seen nothing to compute ohlcv, but yes, it is a matter of aggregating max, min, sum. BUT, I think if you succeed to do it directly in Redis, then yes, this is very likely the fastest way to go.

Bests,

Hi, yes, correct, I work with 1 minute klines. But, it's also useful for 5,10,30 seconds klines.

About "at t0+1s" and "st t0+119s" - yes, it's true. I tested this 3 month ago on IOST-USDT pair. This is pair with poor and weak liquidity. But I work with REST api!

Example for 1 minute klines: 15:00:00.01 - first trade, start cande 15:00, open == price for this trade, close value will be last trade between 15:00:01 to 15:00:59.99 15:00:59.99 - second trade, continue cande 15:00 15:01:00.00 - 15:01:20.00 - 20 second any trade, binance doesn't send me completele candle that start at 15:00 15:01:21.00 - first trade on new candle, start cande 15:01. Binance return completely candle that started at 15.00 and this candle was closed, because current_trade_timestamp > open_timestamp_first_candle(15:00) + 60 sec

cryptorevizor commented 4 years ago

PS: out of curiosity, do you use a specific library to manage your trades? I am curious to know. On my side, I only know CCXT, but curious to know if other open source ones exist.

I use backtrader for infrastructure and detect trade conditionals, ccxt for speaking between backtrader and exchange. But it doen't matter, because we can generate orders and fetch data directly by requests module. ccxt is better for me due rate limiter and other methods that have been unified.

cryptorevizor commented 4 years ago

@cryptorevizor you probably want to use cryptofeed for that, this project is more for storing all the data (for backesting or whatever you want to do with it later), as such, real time things like that are not really supported. There is a way to get the data in realtime via ZMQ, but its just the raw data, and I wouldnt set this all up just for that, you can get the data via ZMQ and a ton of other ways in near real time with just crypotfeed. Cryptofeed supports redis streams and timeseries redis data, you can also define your own aggregator to do your own OHLCV calculations with whatever metrics you want: https://github.com/bmoscon/cryptofeed/blob/master/cryptofeed/backends/aggregate.py#L36

Thanks a lot @bmoscon! I'll try to implement this use RedisTimeseries. After testing, I'll write about how it working and if this method could be applicable and faster then getting directly from binance by websocket.

bmoscon commented 4 years ago

I'm going to mark this as closed - it seems the question has been answered