bmoscon / cryptofeed

Cryptocurrency Exchange Websocket Data Feed Handler
Other
2.2k stars 679 forks source link

Higher CPU usage with time #336

Closed machacd closed 3 years ago

machacd commented 3 years ago

Describe the bug When I start Cryptofeed, it consumes very little resources (say 10-13% CPU on a single core), however, after few hours, it goes up to 30-50% and plateaus there. If I restart it, it is low again.

To Reproduce I am using the following feeds:

def main():
    f = FeedHandler(retries=200)
    f.add_feed(Coinbase(channels=[TRADES], pairs=[
               'BTC-USD'], callbacks={TRADES: KustomTradePostgres(**postgres_cfg)}))
    f.add_feed(Bitfinex(pairs=['BTC-USD'], channels=[TRADES],
                        callbacks={TRADES: KustomTradePostgres(**postgres_cfg)}))
    f.add_feed(Kraken(pairs=['BTC-USD'], channels=[TRADES],
                      callbacks={TRADES: KustomTradePostgres(**postgres_cfg)}))
    f.add_feed(Binance(pairs=['BTC-USDT'], channels=[TRADES, L2_BOOK],
                       callbacks={TRADES: KustomTradePostgres(**postgres_cfg),
                                  L2_BOOK: KustomBookPostgres(**postgres_cfg)}))
    f.add_feed(Bybit(pairs=['BTC-USD'], channels=[TRADES, L2_BOOK],
                     callbacks={TRADES: KustomTradePostgres(**postgres_cfg),
                                L2_BOOK: KustomBookPostgres(**postgres_cfg)}))
    f.add_feed(BinanceFutures(pairs=['BTC-USDT'], channels=[TRADES, L2_BOOK],
                              callbacks={TRADES: KustomTradePostgres(**postgres_cfg),
                                         L2_BOOK: KustomBookPostgres(**postgres_cfg)}))
    f.add_feed(Bitmex(pairs=['XBTUSD'], channels=[TRADES],
                      callbacks={TRADES: KustomTradePostgres(**postgres_cfg)}))

    f.run()

with

class KustomTradePostgres(PostgresCallback, BackendTradeCallback):
    default_table = TRADES

    async def write(self, feed, pair, recv_timestamp, timestamp, data):
        if data['side'] == "buy":
            data['side'] = True
        else:
            data['side'] = False
        if 'id' in data:
            d = f"'{data['side']}',{data['amount']},{data['price']},'{data['id']}'"
        else:
            d = f"'{data['side']}',{data['amount']},{data['price']},NULL"
        await super().write(feed, pair, timestamp, recv_timestamp, d)

class KustomBookPostgres(PostgresCallback, BackendBookCallback):
    default_table = 'books'

    async def write(self, feed, pair, recv_timestamp, timestamp, data):
        global last_vals_b
        global last_vals_bf
        a = np.array(list(data['bid'].values()))[:20]
        b = np.array(list(data['bid'].keys()))[:20]
        c = np.array(list(data['ask'].values()))[:20]
        d = np.array(list(data['ask'].keys()))[:20]
        imba = sum(a)/sum(c)
        if feed == 'BINANCE':
            if [d[0], b[0], imba] != last_vals_b:
                await super().write(feed, pair, timestamp, recv_timestamp, f'{d[0]}, {b[0]}, {imba}')
                last_vals_b = [d[0], b[0], imba]
        if feed == 'BINANCE_FUTURES':
            if [d[0], b[0], imba] != last_vals_bf:
                await super().write(feed, pair, timestamp, recv_timestamp, f'{d[0]}, {b[0]}, {imba}')
                last_vals_bf = [d[0], b[0], imba]

Operating System: Ubuntu 18.04

Cryptofeed Version 1.61

bmoscon commented 3 years ago

I'm not sure this is a bug - I see no evidence here that there is a bug in cryptofeed, its more likely your setup cant handle the volume of writes to postgres. you might want to consider caching/batching the writes, or having a 2nd process handle the writes

qinghuangchan commented 3 years ago

Im writing to a postgres database too, and I have seen both the collector and the pg process piling up because individual writes have WAAAYY to much overhead.

https://github.com/bmoscon/cryptofeed/pull/339 for caching and writing in batches.

machacd commented 3 years ago

Thank you @qinghuangchan, I am trying your branch now. What order of magnitude of writes are you talking about, when you observe piling? In my case, I write about 50 records per second

machacd commented 3 years ago

I have tried it out @qinghuangchan , with different batch sizes from 5 to 40. It seems to work quite well, but it unfortunately does not resolve my issue, the CPU usage still climbs after a while.

qinghuangchan commented 3 years ago

I see that you are subscribed to trades and l2 book, which should be pretty noisy channels.. I have mine set at 1000 for exchanges that I seem to be less noisy, and 10000 for exchanges that are more noisy.