Open cheeseandcereal opened 5 years ago
Have experienced with queryBlocks calls similar to
client.queryBlocks({ luceneQuery: 'block_id:>${start_block_id} AND timestamp:>=${start_timestamp}', limit: 50, sort: 'block_id:asc'})
createClient({credentials})
then client.queryBlocks()
)Interestingly, I've ONLY experienced it when querying a specific managed L2 node (public ID: cj4i7y1ucMtuz1jqTfa6edZ8x1XH85DU3gL8KaJZ1e2c
). Once it errors, if I try to run the exact query (starting from the same block number) it continues to fail. If I bump the block ID by one, it "clears the pipes" and I can resume where I left off (with the original query) until a few more pulls later it errors again.
Other managed L3s (21mPVWRyyJHNTTQZEgy1hqefUpzsgCLLt9fhkRWQQrTTt
) and unmanaged L2s (21aVsbEYCfSpe3aBaoT3J4Kp1Q8mjF7A6BSZF8yBGRJTD
work fine.
In fact, on the managed L3, I'm able to rapid-fire requests to pull all 15000+ blocks 50 at a time and have done so perhaps 15-20 times in testing with no error at any time. The managed L2, though, errors within the first 5 calls or so every time.
The application experiencing this issue can be found here: https://github.com/Dragonchain-Community/dragonchain-uvn-block-explorer/tree/dev
The problem area is this line in think https://github.com/dragonchain/dragonchain-sdk-javascript/blob/master/src/services/dragonchain-client/DragonchainClient.ts#L1324
I suggest that we use more precise NODEJS timing functions. Suggest this --> https://nodejs.org/api/process.html#process_process_hrtime_bigint
process.hrtime.bigint()
The date has to be sent as an ISO 8601 time string, so I don't know if getting a bigint unix timestamp would help. Additionally that function you linked isn't compatible with node 8, which we support, so we can't use that function anyways.
I'm also not 100% convinced it's that line with the issue, because I've seen this error happen when performing a query manually in the node repl for the first time and I still get the error. Could maybe be the http request library doing some weird re-request?
Either way, this warrants more investigation.
for(let i=0;i<100;i++){console.log(new Date().now().toISOString())}
yields:
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.226Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z
2019-09-16T17:36:26.227Z```
if we had more precision in that string we could solve this, but i actually agree that bigint might not be exactly what we need... after reading further on it
Yeah, of course you can get a raw for loop to generate the same date string when there's no other logic. But I've also seen this error occur when performing a single request, so although getting the date iso string may be a problem, I'm still not convinced that's the root cause.
In my case, I'm running into the issue when rapid-firing queryBlocks requests: doesn't the replay protection also depend on the actual query (and not just the timing of requests; so two different requests, even at the same instant, should theoretically go through fine)? In that case, I'd think we're back to some kind of poor error condition handling with the http request library.
I shouldn't ever be intentionally sending the same request through twice at all, and it continues to error if I DO reissue the same request, even seconds or minutes apart, until I issue a new (different) request.
Yeah, the replay protection only comes into play if the signature of the request is identical, which means all parts of the request, including query parameters and time stamp would have to be identical to trigger replay protection.
Okay, pure speculation here: is it possible that the SDK client is reusing previous fetch options somehow if the query parameters match a previous failed request? If timestamp is enough to get past the replay protection, I'd think reissuing the same failed request a few seconds or more later would DEFINITELY get it through, but it doesn't.
I'm not sure why that would happen, but I wouldn't immediately rule it out until I see proof otherwise. There's no code to 'cache' any sort of request data or anything, but yeah, if you're making a request seconds later, it absolutely should work fine.
Let me see if I can force-break it again and prove that idea.
Disregard that thought. It was a red herring. Once I get it to error, I can rerun that request over and over (with a few seconds between each request), and whether it succeeds appears to be random (no specific trigger from my end).
I'll leave this to the experts. 😄
It IS still ONLY happening on that specific managed L2 for me though. Not a single issue with other nodes.
I am fairly confident that the timestamp precision is the issue. Python gives 6 digits for the miliseconds. Node gives 3. Even in a for-loop within python, 6 is sufficient
# Python
for x in range(0, 100): print(datetime.datetime.utcnow().isoformat() + 'Z')
2019-09-16T18:28:09.086199Z
2019-09-16T18:28:09.086205Z
2019-09-16T18:28:09.086211Z
2019-09-16T18:28:09.086216Z
2019-09-16T18:28:09.086221Z
2019-09-16T18:28:09.086226Z
2019-09-16T18:28:09.086232Z
2019-09-16T18:28:09.086237Z
2019-09-16T18:28:09.086242Z
2019-09-16T18:28:09.086248Z
2019-09-16T18:28:09.086253Z
2019-09-16T18:28:09.086259Z
2019-09-16T18:28:09.086264Z
2019-09-16T18:28:09.086269Z
2019-09-16T18:28:09.086275Z
2019-09-16T18:28:09.086280Z
I know that the node sdk has less precision, BUT I have seen this happen when performing a SINGLE request in a repl, which is why even if the timestamp was an issue, there is at least one other thing also causing this problem.
Check this out --> https://github.com/wadey/node-microtime This guy built a C extension to the N-API which gets more precise time.
I know that the node sdk has less precision, BUT I have seen this happen when performing a SINGLE request in a repl, which is why even if the timestamp was an issue, there is at least one other thing also causing this problem.
ahh. gotcha
Check this out: https://github.com/expressjs/express/issues/2512#issuecomment-72342110
I have crap for internet, and I’ve been having even more trouble with connection timeouts, etc., in recent days, and usually only to certain sites/addresses.
I wonder if I’m not causing the problem by failed requests that node/my browser is automatically repeating, but only to that endpoint?
If it’s failed request handling on the nodejs/expressjs side from my end, they would explain how I could be sending two identical queries, but I’d still think it wouldn’t be sending them in the same thousandth of a second.
Yeah, I've seen that problem before with express as the webserver, but dragonchain uses python flask+gunicorn for the webserver, so I highly doubt that's the issue. I currently suspect some sort of hidden retry logic within node's HTTP request code itself, or maybe in node-fetch
, which is the package we're using to perform http requests in this sdk?
Yeah, unfortunately I haven't come across a consistent way to repro. Any consistent repro steps would be highly appreciated.
Attempted a fix with #5 Will leave this issue open until we confirm this actually fixes the problem
Note this was released as version 4.0.0-rc.2
under npm, so it can be installed before the production release for testing.
Same problem is occurring both with and without the adjusted timestamp format. I manually added the new timestamp format (with 3 random trailing digits) in SDK version 3.2.0 and ran the same method that was causing the problem.
In my tests, I've added code that should log ANY http/https request that's made.
Sample requests with old timestamp format:
{ protocol: 'https:',
slashes: true,
auth: null,
host: 'redacted',
port: null,
hostname: 'redacted',
hash: null,
search:
'?q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
query:
'q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
pathname: '/v1/block',
path:
'/v1/block?q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
href:
'https://redacted.com/v1/block?q=block_id%3A%3E7447+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
method: 'GET',
headers:
[Object: null prototype] {
dragonchain: [ 'redacted' ],
Authorization:
[ 'redacted' ],
timestamp: [ '2019-09-17T21:10:56.318Z' ],
Accept: [ '*/*' ],
'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
'Accept-Encoding': [ 'gzip,deflate' ],
Connection: [ 'close' ] },
agent: undefined }
{ protocol: 'https:',
slashes: true,
auth: null,
host: 'redacted',
port: null,
hostname: 'redacted',
hash: null,
search:
'?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
query:
'q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
pathname: '/v1/block',
path:
'/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
href:
'https://redacted.com/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
method: 'GET',
headers:
[Object: null prototype] {
dragonchain: [ 'redacted' ],
Authorization:
[ 'redacted' ],
timestamp: [ '2019-09-17T21:10:58.464Z' ],
Accept: [ '*/*' ],
'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
'Accept-Encoding': [ 'gzip,deflate' ],
Connection: [ 'close' ] },
agent: undefined }
{ status: 401,
response:
{ error:
{ type: 'AUTHENTICATION_ERROR',
details: 'Previous matching request found (no replays allowed)' } },
ok: false }
Sample requests with new timestamp format:
{ protocol: 'https:',
slashes: true,
auth: null,
host: 'redacted',
port: null,
hostname: 'redacted',
hash: null,
search:
'?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
query:
'q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
pathname: '/v1/block',
path:
'/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
href:
'https://redacted.com/v1/block?q=block_id%3A%3E7497+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
method: 'GET',
headers:
[Object: null prototype] {
dragonchain: [ 'redacted' ],
Authorization:
[ 'redacted' ],
timestamp: [ '2019-09-17T21:12:09.404669Z' ],
Accept: [ '*/*' ],
'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
'Accept-Encoding': [ 'gzip,deflate' ],
Connection: [ 'close' ] },
agent: undefined }
{ protocol: 'https:',
slashes: true,
auth: null,
host: 'redacted',
port: null,
hostname: 'redacted',
hash: null,
search:
'?q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
query:
'q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
pathname: '/v1/block',
path:
'/v1/block?q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
href:
'https://redacted.com/v1/block?q=block_id%3A%3E7547+AND+timestamp%3A%3E%3D1554076800&sort=block_id%3Aasc&offset=0&limit=50',
method: 'GET',
headers:
[Object: null prototype] {
dragonchain: [ 'redacted' ],
Authorization:
[ 'redacted' ],
timestamp: [ '2019-09-17T21:12:11.053559Z' ],
Accept: [ '*/*' ],
'User-Agent': [ 'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)' ],
'Accept-Encoding': [ 'gzip,deflate' ],
Connection: [ 'close' ] },
agent: undefined }
{ status: 401,
response:
{ error:
{ type: 'AUTHENTICATION_ERROR',
details: 'Previous matching request found (no replays allowed)' } },
ok: false }
Yeah, this strengthens my suspicion that there must be some silent retry logic going on that we don't see. Warrants further investigation
Could even be an http protocol-level matter (https://blogs.oracle.com/ravello/beware-http-requests-automatic-retries), but that seems to normally affect long-running requests, and I have no idea if node would see that as a separate request (and thus log it in my code) or not.
In fact, I'm inclined to think I WOULDN'T see the retry. From the article:
By capturing the requests on the server side, we saw that the repeating requests were identical and that there is no way to actually know (from server perspective) that they are part of a retry. Also, from the browser side, there’s no trace for this (e.g., opening the network view doesn’t show the retry).
Edit: though the actual requests are being made within my nodejs app (inside an expressjs request), not being made directly through the client SDK in the browser).
The weirdest part is that this is only happening against the one managed L2 for me in my testing, and it happens extremely consistently (every attempt to pull a large number of blocks, and within the first 5-10 requests).
Yeah, the even weirder part is I haven't ever seen this issue in the python SDK. At some point when I get some time, I'll proxy my requests through something that allows me to watch my http traffic so I can watch the actual packets going in/out and confirm if this is happening or not.
Quick update on this issue: tried loading from the problematic managed L2 I originally noticed the problem on, and since the 4.0 update, it looks like it's working fine now. Pulled 38k+ blocks with no trouble at all.
I'm still blaming elasticsearch. Will update here if I run into it again.
For some reason, the javascript SDK occasionally sends repeat requests which trigger the replay protection on Dragonchain (
Previous matching request found (no replays allowed)
). This is a bug that only occurs with this SDK.Cause at this time is still unknown. If any details are discovered, this ticket should be updated.