Closed agomezlyte closed 2 years ago
Hi @agomezlyte
From your description, I noticed you are using quorum based on geth 1.8.12. Can you upgrade to Quorum 2.6.0+ which is based on geth 1.9 and test it out? As you are running on a live network with an approximate 2-year period, it is possible that your execution time takes longer with increasing data on chain. Before geth 1.9 upgrade, there is an automated timeout for eth_call
which will only return 0x
on timeout. This has been changed to returning a proper timeout error as part of 1.9 release in upstream: https://github.com/ethereum/go-ethereum/pull/19737
Hi @zzy96 , thanks for the fast response. I will ask if it's possible to upgrade, test again and let you know the results.
Hi @zzy96, I'm experiencing this timeout issue with contracts which aren't that old. I'm trying to understand where this timeout comes from: Is the JSON-RPC API timing out because the corresponding process which reads the blocks takes too long? Is this timeout configurable? What can make geth "sometimes" take longer to read data from the blockchain?
I'm using geth v1.8.12 as well, and I'm afraid I can't upgrade it for compatibility reasons.
Thanks in advance.
Hi @rbarriuso, there can be multiple reasons causing the timeout (machine limit? too many requests?)... As for the timeout configuration, you can find there is a 5-second timeout currently hardcoded in PublicBlockChainAPI
Call
function in internal/ethapi/api.go
. The only way to change it is to rebuild it with a different value.
@zzy96 Is there any solution for that? For example, would a more "powerful" machine run the queries faster to avoid reaching the timeout?
@zzy96 Is there any solution for that? For example, would a more "powerful" machine run the queries faster to avoid reaching the timeout?
Definitely a more powerful machine will help. Also if some pieces of onchain data are frequently accessed, you may consider storing a copy in an offchain database for better query performance and constantly monitoring the onchain data change to update.
Hello, we have been researching a little bit, and we want to know if we could solve our timeout problem with this geth version, by increasing the different RPC timeouts.
@zzy96 What do you think about what @agomezlyte proposed?
@zzy96 We have already tried the RPC timeouts implemented in the issue above (https://github.com/ethereum/go-ethereum/pull/17240) and it doesn't solve our problem. We would like to know how to solve or workaround this timeout problem. Now our geth version is v1.8.18 but we cannot update anymore because the network is not compatible with new geth versions.
@agomezlyte Just a thought that occurred to me. If I remember correctly Alastria runs IBFT. Is it possible that the answer you are waiting for hasn't been committed yet? I'm checking because response of 'latest' block on IBFT is the same as'pending' and this may be the source of the issue?
@fixanoid You are right Alastria runs IBFT. We are calling a view method, the state that returns this methods had not change previously. The error happens when we call the view method after long periods (6-8 hours) inactivity in the node.
I think we can dismiss the "'latest' block on IBFT is the same as'pending'" issue, because we dont receive any error if we change the state of the values that view method returns and then we call the view method.
@fixanoid , @zzy96 Do you have any more information about this? It would really help us. Thanks!
@fixanoid , @zzy96 Do you think it would be a good idea to test with different versions of Quorum? (we can not reproduce this issues with a local blockchain network). do we have to discard a possible solution due to to the use of such an old version?
@rbarriuso @rdemera I don't just think its a good idea, I think thats the only way for us to be able to determine where the issue is. Unfortunately, Alastria's fork is both massively out of date and has diverged significantly enough that we'd prefer a somewhat working sample of the issue on our own codebase -- would you be able to provide that?
@fixanoid we deployed a minimal version of the smart contract in Alastria network, the error is still sometimes when we call the view method: https://gist.github.com/rdemera/d9a256553f36f6860ed8920d95b81639
The way we got into the error:
After the step 4 you should receive a valid answer, but sometimes you get the issue
As per the response from @fixanoid earlier - are you able to recreate the issue using vanilla quorum, rather than on the Alastria network?
@fixanoid @SatpalSandhu61 We have tested with local networks (ganache) and the error does not reproduce. With vanilla quorum we have not tested, I do not know if it is appropriate to do a test in a local network knowing that with ganache we the issue does not reproduce. Is there a way to deploy the contract on a vanilla quorum test network? We also tested this contracts in a besu network and we didn't get the error either.
I don't know what the next steps have to be and what we can do. Thanks a lot for the help!
@rdemera ganache, besu, and alstria's geth are not the systems we need to be able to replicate this issue on. Please retest with unmodified Quorum geth and let us know the results -- you can use quorum-wizard to set up a network for yourself quickly: https://github.com/ConsenSys/quorum-wizard.
If you think its better and faster to do this over real time chat, please join us on our slack and we can have a more lively debugging session. Thanks.
Thanks a lot @fixanoid !!! We will try with quorum-wizard and let you know the results, and if it is necessary we will be happy do the real time chat,
We really want to find a solution, we are testing our product in prod an this is error is a pain, so thank you very much!!!
Assuming this has been fixed, feel free to reopen if that's not the case.
System information
Geth version: 1.8.12-stable
OS & Version: 16.04.3 LTS (GNU/Linux 4.4.0-112-generic x86_64)
Expected behaviour
We want to call the method
getUserSeasonReputation
from this smartcontract: (simplified version)The contract's structs are filled with data. The result of the request should be always successful since we are calling the method with the correct permissions.
Actual behaviour
Sometimes the request response is empty. Example: When making the request (which calls the
getUserSeasonReputation
with some parameters) :curl -k 'https://quorum.node/rpc' -H 'User-Agent: Mozilla/5.0 (Browser javascript) node.js/undefined v8/undefined' -H 'Accept: */*' -H 'Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3' --compressed -H 'Referer: http://some.domain.com/' -H 'Content-Type: application/json' -H 'Origin: http://some.domain.com/' -H 'Connection: keep-alive' --data-raw '{"jsonrpc":"2.0","id":64,"method":"eth_call","params":[{"data":"0x805ac6f2000000000000000000000000cd7b2b6157a458a5a8ec58604f0636d9173e5b220000000000000000000000000000000000000000000000000000000000000006","to":"0xb69f8ec5aa25ce07a66e9eb1cd7d5bf4ad545ead"},"latest"]}'
the response should be:
{"jsonrpc":"2.0","id":64,"result":"0x000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000001400000000000000000000000000000000000000000000000000000048c2739500000000000000000000000000000000000000000000000000000000000000000b900000000000000000000000000000000000000000000000000000000000000780000000000000000000000000000000000000000000000000000000000000000000000000000000000000000cd7b2b6157a458a5a8ec58604f0636d9173e5b2200000000000000000000000000000000000000000000000000002755ece2f400000000000000000000000000000000000000000000000000000000000000000d5261666120426172726975736f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001772626172726975736f4074726962616c7974652e636f6d000000000000000000"}
but sometimes the response is:
{"jsonrpc":"2.0","id":64,"result":"0x"}
We have also tested it with different versions of web3 (1.0.0-beta34 and the 1.2.1), and the error given is either: "The returned value is not a convertible string" or "Returned values aren't valid, did it run Out of Gas?" This errors are shown when the request response result is "0x" or empty string.
The error is not always happening, only sometimes. This same node is being used to call other smart contracts and has been working fine for almost 2 years. We started to get this error 1 month ago, with already deployed contracts which where working fine until then.
We are working on Alastria net T which is a free gas permissioned network.
Backtrace
Quorum node is not giving any trace nor error when sending the empty response.