Consensys / quorum

A permissioned implementation of Ethereum supporting data privacy
https://www.goquorum.com/
GNU Lesser General Public License v3.0
4.68k stars 1.29k forks source link

Empty responses when calling view method #1039

Closed agomezlyte closed 2 years ago

agomezlyte commented 4 years ago

System information

Geth version: 1.8.12-stable

OS & Version: 16.04.3 LTS (GNU/Linux 4.4.0-112-generic x86_64)

Expected behaviour

We want to call the method getUserSeasonReputation from this smartcontract: (simplified version)

pragma solidity 0.4.22;

contract Test {

    HashUserMap private hashUserMap;
    address private allowedAddress;

    event UserProfileSetEvent (string name, address hash);

    struct EmailUserMap {
        mapping (bytes32 => address) map;
    }

    struct HashUserMap {
        mapping (address => UserProfile) map;
    }

    struct UserProfile {
        string name;
        string email;
        address hash;
        UserStats globalStats;
        mapping (uint256 => UserSeason) seasonData;
    }

    struct UserStats {
        uint256 reputation;
        uint256 cumulativeComplexity;
        uint256 numberOfTimesReview;
        uint256 agreedPercentage;
        uint256 positeVotes;
        uint256 negativeVotes;
        uint256 reviewsMade;
        uint256 commitsMade;
    }

    struct UserSeason {
        UserStats seasonStats;
        mapping (bytes32 => bool) seasonCommits;
        bytes32[] urlSeasonCommits;
        bytes32[] allReviews;
        bytes32[] finishedReviews;
        bytes32[] pendingReviews;
        bytes32[] toRead;
    }

    modifier onlyDapp() {
        require (msg.sender == allowedAddress || msg.sender == tx.origin);
        _;
    }

    constructor(address allowed) public {
        allowedAddress = allowed;
    }

    function getUserSeasonReputation(address userHash, uint256 seasonIndex) public onlyDapp view returns(string, string, uint256, uint256, uint256, uint256, address, uint256) {
        UserProfile memory user = hashUserMap.map[userHash];
        UserSeason memory season = hashUserMap.map[userHash].seasonData[seasonIndex];
        return (user.name,
            user.email,
            season.seasonStats.reputation,
            season.seasonStats.reviewsMade,
            season.seasonStats.commitsMade,
            season.seasonStats.agreedPercentage,
            user.hash,
            season.seasonStats.cumulativeComplexity
        );
    }
}

The contract's structs are filled with data. The result of the request should be always successful since we are calling the method with the correct permissions.

Actual behaviour

Sometimes the request response is empty. Example: When making the request (which calls the getUserSeasonReputation with some parameters) : curl -k 'https://quorum.node/rpc' -H 'User-Agent: Mozilla/5.0 (Browser javascript) node.js/undefined v8/undefined' -H 'Accept: */*' -H 'Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3' --compressed -H 'Referer: http://some.domain.com/' -H 'Content-Type: application/json' -H 'Origin: http://some.domain.com/' -H 'Connection: keep-alive' --data-raw '{"jsonrpc":"2.0","id":64,"method":"eth_call","params":[{"data":"0x805ac6f2000000000000000000000000cd7b2b6157a458a5a8ec58604f0636d9173e5b220000000000000000000000000000000000000000000000000000000000000006","to":"0xb69f8ec5aa25ce07a66e9eb1cd7d5bf4ad545ead"},"latest"]}'

the response should be:

{"jsonrpc":"2.0","id":64,"result":"0x000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000001400000000000000000000000000000000000000000000000000000048c2739500000000000000000000000000000000000000000000000000000000000000000b900000000000000000000000000000000000000000000000000000000000000780000000000000000000000000000000000000000000000000000000000000000000000000000000000000000cd7b2b6157a458a5a8ec58604f0636d9173e5b2200000000000000000000000000000000000000000000000000002755ece2f400000000000000000000000000000000000000000000000000000000000000000d5261666120426172726975736f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001772626172726975736f4074726962616c7974652e636f6d000000000000000000"}

but sometimes the response is: {"jsonrpc":"2.0","id":64,"result":"0x"}

We have also tested it with different versions of web3 (1.0.0-beta34 and the 1.2.1), and the error given is either: "The returned value is not a convertible string" or "Returned values aren't valid, did it run Out of Gas?" This errors are shown when the request response result is "0x" or empty string.

The error is not always happening, only sometimes. This same node is being used to call other smart contracts and has been working fine for almost 2 years. We started to get this error 1 month ago, with already deployed contracts which where working fine until then.

We are working on Alastria net T which is a free gas permissioned network.

Backtrace

Quorum node is not giving any trace nor error when sending the empty response.

zzy96 commented 4 years ago

Hi @agomezlyte

From your description, I noticed you are using quorum based on geth 1.8.12. Can you upgrade to Quorum 2.6.0+ which is based on geth 1.9 and test it out? As you are running on a live network with an approximate 2-year period, it is possible that your execution time takes longer with increasing data on chain. Before geth 1.9 upgrade, there is an automated timeout for eth_call which will only return 0x on timeout. This has been changed to returning a proper timeout error as part of 1.9 release in upstream: https://github.com/ethereum/go-ethereum/pull/19737

agomezlyte commented 4 years ago

Hi @zzy96 , thanks for the fast response. I will ask if it's possible to upgrade, test again and let you know the results.

rbarriuso commented 4 years ago

Hi @zzy96, I'm experiencing this timeout issue with contracts which aren't that old. I'm trying to understand where this timeout comes from: Is the JSON-RPC API timing out because the corresponding process which reads the blocks takes too long? Is this timeout configurable? What can make geth "sometimes" take longer to read data from the blockchain?

I'm using geth v1.8.12 as well, and I'm afraid I can't upgrade it for compatibility reasons.

Thanks in advance.

zzy96 commented 4 years ago

Hi @rbarriuso, there can be multiple reasons causing the timeout (machine limit? too many requests?)... As for the timeout configuration, you can find there is a 5-second timeout currently hardcoded in PublicBlockChainAPI Call function in internal/ethapi/api.go. The only way to change it is to rebuild it with a different value.

agomezlyte commented 4 years ago

@zzy96 Is there any solution for that? For example, would a more "powerful" machine run the queries faster to avoid reaching the timeout?

zzy96 commented 4 years ago

@zzy96 Is there any solution for that? For example, would a more "powerful" machine run the queries faster to avoid reaching the timeout?

Definitely a more powerful machine will help. Also if some pieces of onchain data are frequently accessed, you may consider storing a copy in an offchain database for better query performance and constantly monitoring the onchain data change to update.

agomezlyte commented 4 years ago

Hello, we have been researching a little bit, and we want to know if we could solve our timeout problem with this geth version, by increasing the different RPC timeouts.

https://github.com/ethereum/go-ethereum/pull/17240

rbarriuso commented 4 years ago

@zzy96 What do you think about what @agomezlyte proposed?

agomezlyte commented 4 years ago

@zzy96 We have already tried the RPC timeouts implemented in the issue above (https://github.com/ethereum/go-ethereum/pull/17240) and it doesn't solve our problem. We would like to know how to solve or workaround this timeout problem. Now our geth version is v1.8.18 but we cannot update anymore because the network is not compatible with new geth versions.

fixanoid commented 4 years ago

@agomezlyte Just a thought that occurred to me. If I remember correctly Alastria runs IBFT. Is it possible that the answer you are waiting for hasn't been committed yet? I'm checking because response of 'latest' block on IBFT is the same as'pending' and this may be the source of the issue?

ghost commented 4 years ago

@fixanoid You are right Alastria runs IBFT. We are calling a view method, the state that returns this methods had not change previously. The error happens when we call the view method after long periods (6-8 hours) inactivity in the node.

I think we can dismiss the "'latest' block on IBFT is the same as'pending'" issue, because we dont receive any error if we change the state of the values that view method returns and then we call the view method.

rbarriuso commented 4 years ago

@fixanoid , @zzy96 Do you have any more information about this? It would really help us. Thanks!

ghost commented 4 years ago

@fixanoid , @zzy96 Do you think it would be a good idea to test with different versions of Quorum? (we can not reproduce this issues with a local blockchain network). do we have to discard a possible solution due to to the use of such an old version?

fixanoid commented 4 years ago

@rbarriuso @rdemera I don't just think its a good idea, I think thats the only way for us to be able to determine where the issue is. Unfortunately, Alastria's fork is both massively out of date and has diverged significantly enough that we'd prefer a somewhat working sample of the issue on our own codebase -- would you be able to provide that?

ghost commented 4 years ago

@fixanoid we deployed a minimal version of the smart contract in Alastria network, the error is still sometimes when we call the view method: https://gist.github.com/rdemera/d9a256553f36f6860ed8920d95b81639

ghost commented 4 years ago

The way we got into the error:

  1. Deploy the smart contract in Alastria network.
  2. Call the function setProfile(name, email)
  3. Let a few days (2) pass
  4. Call the view method getUserSeasonData(address, number), you have to pass the address with which yoy make the second step.

After the step 4 you should receive a valid answer, but sometimes you get the issue

SatpalSandhu61 commented 4 years ago

As per the response from @fixanoid earlier - are you able to recreate the issue using vanilla quorum, rather than on the Alastria network?

ghost commented 4 years ago

@fixanoid @SatpalSandhu61 We have tested with local networks (ganache) and the error does not reproduce. With vanilla quorum we have not tested, I do not know if it is appropriate to do a test in a local network knowing that with ganache we the issue does not reproduce. Is there a way to deploy the contract on a vanilla quorum test network? We also tested this contracts in a besu network and we didn't get the error either.

I don't know what the next steps have to be and what we can do. Thanks a lot for the help!

fixanoid commented 4 years ago

@rdemera ganache, besu, and alstria's geth are not the systems we need to be able to replicate this issue on. Please retest with unmodified Quorum geth and let us know the results -- you can use quorum-wizard to set up a network for yourself quickly: https://github.com/ConsenSys/quorum-wizard.

If you think its better and faster to do this over real time chat, please join us on our slack and we can have a more lively debugging session. Thanks.

ghost commented 4 years ago

Thanks a lot @fixanoid !!! We will try with quorum-wizard and let you know the results, and if it is necessary we will be happy do the real time chat,

We really want to find a solution, we are testing our product in prod an this is error is a pain, so thank you very much!!!

antonydenyer commented 2 years ago

Assuming this has been fixed, feel free to reopen if that's not the case.