Open xmatthias opened 4 years ago
@xmatthias in short, we are aware and we totally agree with you. Because there are exotic exchanges that do both id-based and time-based pagination depending on the endpoint, this has to be defined either as an exchange-wide property or as metainfo per each unified method. Your suggestions on a proper unification scheme are welcome )
Unifying this should be "as easy" (it probably isn't, i've never done such unifying) as adding a from_id parameter to the header of the method.
Unfortunately, this is not as easy, because that won't work with other languages. But we agree in general. We will address this issue in nearest future, hopefully.
There's generators in JS, PHP and Python. Could we use that some how? And each time we do the equivalent of next(get_trades())
, it goes and gets the next "page". Or, if it's symbol based, the next "symbol". Whatever it does, it just keeps returning another list/array with a yield
command. Or it loops through everything that came back and yields the next one. That way it's easier to use in a loop/map/filter/reduce thing. Thoughts on this? Or something similar? That way it's also lazy, so it'll only get the next one if you need it/ask for it. This is a very high level idea, but I hope it makes sense.
@eabrouwer3
Could we use that some how?
Yes, however, this particular issue is more about unifying the data format that would be used to build more complex traversing algorithms on top of that data. So, the question is less about building the generators themselves and more about unifying the properties for all types of pagination, including limits, maximums, minimums, date-based pagination, id-based pagination, restrictions on how far back into the past you can go, etc, etc. Due to the differences between exchanges, the unification for the pagination metadata (for all methods) is not as easy as it seems at first, but we think we will be able to roll out a good proposal soon. Thx!
Ahhh. I see. Thanks @kroitor.
I have a question regarding a sample from this post
should we add one to the from_id before use it?
Old from_id = t[-1]['id'] t = ct.fetch_trades(pair, params={'fromId':from_id}, limit=1000)
New from_id = t[-1]['id'] + 1 t = ct.fetch_trades(pair, params={'fromId':from_id}, limit=1000)
@YuriyTigiev in general the id
of a trade is a string (it can take any form like '123456789'
or 'abcdef-foo-bar'
), so we can't do arithmetics with it. Instead, we should set the "from-id" to the last received id, and then filter out duplicates by id.
I'm sorting all trades by id (fromId), for an analyze historical data step by step. If fromId is not number, how I can sort all trades in the right historical order? The timestamp is not unique.
@YuriyTigiev in some cases you can sort by timestamp+id if you know for sure that the ids are numeric. However, that will be exchange-specific since it won't work for the exchanges that use hashes as ids. In a general case, you should sort by timestamp even if it is not unique. In some cases you may need to look into the info
of every trade for more clues on the ordering.
@YuriyTigiev in some cases you can sort by timestamp+id if you know for sure that the ids are numeric. However, that will be exchange-specific since it won't work for the exchanges that use hashes as ids. In a general case, you should sort by timestamp even if it is not unique. In some cases you may need to look into the
info
of every trade for more clues on the ordering.
In cases when fromId is a hash we can't use it for pagination (t = ct.fetch_trades(pair, params={'fromId':from_id}, limit=1000)) because order can be incorrect.
@YuriyTigiev yes, that is correct. Also, sometimes, the exchange may provide pagination hints in the fetch_trades response, which is accessible in the .last_json_response
property after the call, however, that is also exchange-specific.
How will be working the method if I pass both parameters "since" and "fromId"? await exchange.fetchTrades(symbol = pair, since = current, params={'fromId':prev_id}, limit = limit)
@YuriyTigiev it will send both params and will filter the results for timestamp
> since
.
Why fetchTrades for binance return radnomly numbers of records? For one pair could return 107, 23, 1000, 2, 50 await exchange.fetchTrades(symbol = pair, params={'fromId':id}, limit = 1000)
@YuriyTigiev it's hard to answer without your code and verbose output.
In general, if you're watching the most recent trades this way, you will get all new trades starting after the specified id. Because trades happen randomly with the exchange (depend on the activity of the users and pairs) – could be any random number of new trades. The number of new trades since your previous request varies over time – this comes naturally from the definition of free market trading.
In other words, it could be pretty much the expected behavior.
@YuriyTigiev you may also want to look through these issues carefully:
That could shed some light on your question. Yet still we will need you to follow the FAQ and paste the code and a complete verbose output in order to investigate.
Last question
https://github.com/binance-exchange/binance-official-api-docs/blob/master/rest-api.md#old-trade-lookup-market_data GET /api/v3/historicalTrades
How fetch_trades calculate the first fromId for the method fetch_trades based on the parameter since? The binance method which historicalTrades doesn't have the parameter "since" but has a parameter fromId only
I have copied part of code from the first post.
t = ct.fetch_trades(pair, since=int(since.timestamp() * 1000))
from_id = t[-1]['id']
trades.extend(t)
while True:
t = ct.fetch_trades(pair, params={'fromId':from_id}, limit=1000)
The binance method which historicalTrades doesn't have the parameter "since" but has a parameter fromId only
Exactly, and this is why...
How fetch_trades calculate the first fromId for the method fetch_trades based on the parameter since?
... it does not. If the underlying endpoint does not accept a specific parameter – that parameter is simply ignored or not sent towards the exchange. So, with the historicalTrades
endpoint the since
argument is ignored by the exchange. If you're using the historicalTrades
endpoint, Binance returns the most recent trades or the trades with fromId
. The since
argument is irrelevant at the moment of your request. And upon receiving the reply with the set of trades from Binance, the CCXT library filters them by since
on the user side (whatever set it received).
However, that is just a half of the story. If you've read the above links carefully, you've probably noticed, that Binance provides more than one endpoint for public trades:
trades
– https://binance-docs.github.io/apidocs/spot/en/#recent-trades-listhistoricalTrades
– https://binance-docs.github.io/apidocs/spot/en/#old-trade-lookupaggTrades
– https://binance-docs.github.io/apidocs/spot/en/#compressed-aggregate-trades-listThe aggTrades
is the default endpoint in CCXT. But you can choose which of the three endpoints you want to use and configure that with the exchange-specific option named exchange.options['fetchTradesMethod']
, as shown here:
For example:
import ccxt
exchange = ccxt.binance({
'enableRateLimit': True,
'options': {
'fetchTradesMethod': 'publicGetHistoricalTrades', # or publicGetTrades or publicGetAggTrades (default)
}
})
# your code here...
Configuring the exchange-specific options is documented in the CCXT Manual:
So, depending on which endpoint you choose, this or that argument or parameter is used to paginate over trades according to Binance API docs, as linked above.
Let me know if that does not answer your question.
Hi,
I had a problem with download historical data from Binance when copied data day by day. I used fetchTrades and the parameter "since". In this case, the function works with low accuracy and can skip data for a period. I wrote my function FindNearestFromId which based on parameter "since" helps to find the nearest fromId for condition a "found timestamp" >= "parameter since".
import ccxt
API_KEY = ''
SECRET_KEY = ''
exchange_class = getattr(ccxt, 'binance')
exchange = exchange_class({
'apiKey': API_KEY,
'secret': SECRET_KEY,
'timeout': 30000,
'defaultType': 'spot',
'enableRateLimit': True
})
dt = '2020-06-02T10:14:15.568Z'
since = exchange.parse8601(dt)
pair = 'ETH/BTC'
def FindNearestFromId(pair, since):
s = exchange.fetch_trades(pair, params={'fromId':'1'}, limit=1)
e = exchange.fetch_trades(pair, limit=1)
sts = int(s[0]['timestamp'])
ets = int(e[0]['timestamp'])
sid = int(s[0]['id'])
eid = int(e[0]['id'])
if(not (sts <= since <= ets) ):
return None
while True:
if( (sid == eid) ):
return sid
if( ( sid == eid - 1) and ( since - sts) <= (ets - since) ):
return sid + 1
if( ( sid == eid - 1) and ( since - sts) > (ets - since) ):
return eid
cid = (eid + sid) // 2
c = exchange.fetch_trades(pair, params={'fromId':f'{cid}'}, limit=1)
cts = int(c[0]['timestamp'])
cid = int(c[0]['id'])
if( ( sts < since <= cts ) or ( sts <= since < cts ) ):
eid = cid
ets = cts
elif( ( cts < since <= ets ) or ( cts <= since < ets ) ):
sid = cid
sts = cts
pass
return None
fromId = FindNearestFromId(pair, since)
f0 = exchange.fetch_trades(pair, params={'fromId':f'{fromId}'}, limit=1)
f1 = exchange.fetch_trades(pair, params={'fromId':f'{fromId-1}'}, limit=1)
f2 = exchange.fetch_trades(pair, params={'fromId':f'{fromId+1}'}, limit=1)
original = exchange.fetch_trades(pair, since=since, limit=1)
print(f"source:\t\t{dt}, {since}")
print(f"found:\t\t{f0[0]['datetime']}, {f0[0]['timestamp']}, delta(ts) = {f0[0]['timestamp'] - since}")
print(f"found-1:\t{f1[0]['datetime']}, {f1[0]['timestamp']}, delta(ts) = {f1[0]['timestamp'] - since}")
print(f"found+1:\t{f2[0]['datetime']}, {f2[0]['timestamp']}, delta(ts) = {f2[0]['timestamp'] - since}")
print(f"original:\t{original[0]['datetime']}, {original[0]['timestamp']}, delta(ts) = {original[0]['timestamp'] - since}")
Result:
source: 2020-06-02T10:14:15.568Z, 1591092855568
found: 2020-06-02T10:14:15.636Z, 1591092855636, delta(ts) = 68
found-1: 2020-06-02T10:14:15.533Z, 1591092855533, delta(ts) = -35
found+1: 2020-06-02T10:14:15.742Z, 1591092855742, delta(ts) = 174
original: 2020-06-02T11:14:15.292Z, 1591096455292, delta(ts) = 3599724
@YuriyTigiev you might also want to check these examples with deduplication by id, thus, instead of fetching by the last timestamp, you can fetch (time window / 2) and then drop the duplicates – that may be easier to handle and implement:
(It's a different exchange, but the concept is similar across all exchanges)
I saw the codes but don't understand how those examples work. Can I use it for copy data from 2019-06-01 00:00:00 from Binance?
@YuriyTigiev i'll post an example for fetching the trade history from Binance as soon as I can.
is rateLimits: 350 optimal for fetching the tradeHistory? should be - enableRateLimit = False ?
@YuriyTigiev
is rateLimits: 350 optimal for fetching the tradeHistory?
The optimal setting depends on the exchange, since every exchange has varying rate limits for this or that endpoint.
should be - enableRateLimit = False ?
Nope, you should leave it on (True
), unless you implement your own custom rate limiter.
inspired by https://github.com/ccxt/ccxt/issues/5683#issuecomment-521753472 (but i think this deserves it's own issue, and i could not find any place this was worked on.
Problem
The current way of pagination is very inconsistent / not well unified (which is also documented...). It's up to the user to detect which methods for which exchanges require date-based (
since=
) pagination, and which require id-based pagination.Ideally, users could supply an argument
next_page=True
- which would handle pagination internally, but i don't think that's possible since it would require to keep the last pagination id per pair(?).Assuming the above is not possible, an attribute (without going to the exchange's api documentation) expressing which method needs to be used for which exchange (and endpoint) would be highly appreciated.
As the documentation states:
'from_id': from_id, # exchange-specific non-unified parameter name
Possible solution
Unifying this should be "as easy" (it probably isn't, i've never done such unifying) as adding a
from_id
parameter to the header of the method, which defaults to None and is not used in case of date-based pagination, but is used for both ID and cursor based pagination.This would move "defining" the correct additional parameter to ccxt instead of the users.
Combined with something like
exchange.describe()['options']['fetchTradesType'] == "page"
andexchange.describe()['options']['fetchTradesPageStart'] == "0"
(probably not the location you'd like to have this) - should allow flexible usage of this method.Sample of the problem:
Problematic part in my eyes: