FlorianWilhelm / zipline-poloniex

Poloniex bundle for zipline
MIT License
65 stars 13 forks source link

zipline_poloniex.api.TradesExceeded: Number of trades exceeded #3

Closed ppwfx closed 6 years ago

ppwfx commented 6 years ago

Hey,

when trying to do zipline ingest -b poloniex with the following configuration

start_session = pd.Timestamp('2017-07-20', tz='utc')
end_session = pd.Timestamp('2017-10-23', tz='utc')
assets = [
    Pairs.usdt_btc,
    Pairs.usdt_eth,
    Pairs.usdt_dash,
    Pairs.usdt_xmr,
    Pairs.usdt_zec,
    Pairs.usdt_ltc,
]

register(
    'poloniex',
    create_bundle(
        assets,
        start_session,
        end_session,
    ),
    calendar_name='POLONIEX',
    minutes_per_day=24*60,
    start_session=start_session,
    end_session=end_session
)

I get zipline_poloniex.api.TradesExceeded: Number of trades exceeded. The only rate limiting information I can find is the 6 requests per second thing.

For a smaller period its working, so Is there any way to get this to work for a greater one?

immackay commented 6 years ago

Hi 21stio,

This error is due to the USDT_BTC pair having more than 50000 trades in a day for certain days. This comes from line 79 in api.py, and is due to the history access limit in the poloniex API. You can fix this error by changing the prepare_data def in bundle.api:

Original code:

def prepare_data(start, end, sid_map, cache):
        def get_key(sid, day):
            return "{}_{}".format(sid, day.strftime("%Y-%m-%d"))`

    for sid, asset_pair in sid_map.items():
        for start_day in pd.date_range(start, end, freq='D', closed='left', tz='utc'):
            key = get_key(sid, start_day)
            if key not in cache:
                end_day = start_day + timedelta(days=1, seconds=-1)
                trades = fetch_trades(asset_pair, start_day, end_day)
                cache[key] = make_candle_stick(trades)
            yield sid, cache[key]

My modified code:

def prepare_data(start, end, sid_map, cache):
    def get_key(sid, day):
        return \"{}_{}\".format(sid, day.strftime(\"%Y-%m-%d\"))

    for sid, asset_pair in sid_map.items():
        for start_day in pd.date_range(start, end, freq='D', closed='left', tz='utc'):
            key = get_key(sid, start_day)
            if key not in cache:
                end1 = start_day + timedelta(hours=8, seconds=-1)
                start2 = start_day + timedelta(hours=8)
                end2 = start_day + timedelta(hours=16, seconds=-1)
                start3 = start_day + timedelta(hours=16)
                end_day = start_day + timedelta(days=1, seconds=-1)
                print("Fetching data for {} from {} to {}".format(asset_pair, start_day, end_day))
                trades = fetch_trades(asset_pair, start_day, end1)
                trades = trades.concat(trades, fetch_trades(asset_pair, start2, end2))
                trades = trades.concat(trades, fetch_trades(asset_pair, start3, end_day))
                cache[key] = make_candle_stick(trades)
            yield sid, cache[key]

This splits each days trade request into three, which fixes the TradesExceeded error. Splitting into two had a few errors on the USDT_BTC pair. It's not beautiful and could likely be done better, but it works.

FlorianWilhelm commented 6 years ago

Thanks @immac636 for finding to root of this problem. Your workaround seems to solve it but what happens if there are more than 50000 trades in 8 hours? Do yo want to work on a solution where only the logic of fetch_trades is altered in a way that if the TradesExceeded error occurs the current daily interval is split automatically similar to the bisection method?

immackay commented 6 years ago

I arbitrarily selected 8 when I was trying to get this working back in August - I'll see if I can implement that. Thanks for the idea.

ppwfx commented 6 years ago

ah awesome, thanks man, looks good! :)

What is the trade data actually needed for? Is it each and every single trade? Is it bid and ask? And why isnt get_chart_data called anywhere, isnt that the OHLC data?

immackay commented 6 years ago

The trade data is each individual trade - it is pulled from https://poloniex.com/public?command=returnTradeHistory&currencyPair={PAIR}&start={START IN UNIX TIMESTAMP}&end={END IN UNIX TIMESTAMP}. You can see an example on the Poloniex API Documentation.

The get_chart_data method doesn't work for minute based data, which is what we need for zipline. It is limited to these periods (minutes): 5, 15, 30, 120, 240, 480.

OHLC data is instead obtained through the make_candle_stick method, by resampling tick data (individual trades) into minute data and calculating the max, min, first, and last values. This data is then passed into a dataframe along with the resampled volume and sent along to zipline for ingestion.

immackay commented 6 years ago

@21stio See my fork of this repo for the final version of this fix

FlorianWilhelm commented 6 years ago

Thanks to @immac636, this issue is closed now!