antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.22k stars 3.71k forks source link

Split and update Solidity grammar #4257

Closed codeFather2 closed 1 month ago

codeFather2 commented 1 month ago

This PR updates the Solidity grammar to match the latest version of the official documentation. Link to official grammar: https://github.com/ethereum/solidity/tree/develop/docs/grammar

kaby76 commented 1 month ago

This PR updates the Solidity grammar to match the latest version of the official documentation. Link to official grammar: https://github.com/ethereum/solidity/tree/develop/docs/grammar

OK, it makes sense to match the grammar here with that at https://github.com/ethereum/solidity/tree/develop/docs/grammar. That is why variableDeclarationList still exists. Perhaps just make a note in the readme.md that this is a copy of the official version?

codeFather2 commented 1 month ago

This PR updates the Solidity grammar to match the latest version of the official documentation. Link to official grammar: https://github.com/ethereum/solidity/tree/develop/docs/grammar

OK, it makes sense to match the grammar here with that at https://github.com/ethereum/solidity/tree/develop/docs/grammar. That is why variableDeclarationList still exists. Perhaps just make a note in the readme.md that this is a copy of the official version?

Yeah, i added a comment that grammar is copied from docs repo

kaby76 commented 1 month ago

As mentioned above, there seems to be ambiguity in expression with this input, which is the first example from https://docs.soliditylang.org/en/v0.8.27/solidity-by-example.html.

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.7.0 <0.9.0;
/// @title Voting with delegation.
contract Ballot {
    // This declares a new complex type which will
    // be used for variables later.
    // It will represent a single voter.
    struct Voter {
        uint weight; // weight is accumulated by delegation
        bool voted;  // if true, that person already voted
        address delegate; // person delegated to
        uint vote;   // index of the voted proposal
    }

    // This is a type for a single proposal.
    struct Proposal {
        bytes32 name;   // short name (up to 32 bytes)
        uint voteCount; // number of accumulated votes
    }

    address public chairperson;

    // This declares a state variable that
    // stores a `Voter` struct for each possible address.
    mapping(address => Voter) public voters;

    // A dynamically-sized array of `Proposal` structs.
    Proposal[] public proposals;

    /// Create a new ballot to choose one of `proposalNames`.
    constructor(bytes32[] memory proposalNames) {
        chairperson = msg.sender;
        voters[chairperson].weight = 1;

        // For each of the provided proposal names,
        // create a new proposal object and add it
        // to the end of the array.
        for (uint i = 0; i < proposalNames.length; i++) {
            // `Proposal({...})` creates a temporary
            // Proposal object and `proposals.push(...)`
            // appends it to the end of `proposals`.
            proposals.push(Proposal({
                name: proposalNames[i],
                voteCount: 0
            }));
        }
    }

    // Give `voter` the right to vote on this ballot.
    // May only be called by `chairperson`.
    function giveRightToVote(address voter) external {
        // If the first argument of `require` evaluates
        // to `false`, execution terminates and all
        // changes to the state and to Ether balances
        // are reverted.
        // This used to consume all gas in old EVM versions, but
        // not anymore.
        // It is often a good idea to use `require` to check if
        // functions are called correctly.
        // As a second argument, you can also provide an
        // explanation about what went wrong.
        require(
            msg.sender == chairperson,
            "Only chairperson can give right to vote."
        );
        require(
            !voters[voter].voted,
            "The voter already voted."
        );
        require(voters[voter].weight == 0);
        voters[voter].weight = 1;
    }

    /// Delegate your vote to the voter `to`.
    function delegate(address to) external {
        // assigns reference
        Voter storage sender = voters[msg.sender];
        require(sender.weight != 0, "You have no right to vote");
        require(!sender.voted, "You already voted.");

        require(to != msg.sender, "Self-delegation is disallowed.");

        // Forward the delegation as long as
        // `to` also delegated.
        // In general, such loops are very dangerous,
        // because if they run too long, they might
        // need more gas than is available in a block.
        // In this case, the delegation will not be executed,
        // but in other situations, such loops might
        // cause a contract to get "stuck" completely.
        while (voters[to].delegate != address(0)) {
            to = voters[to].delegate;

            // We found a loop in the delegation, not allowed.
            require(to != msg.sender, "Found loop in delegation.");
        }

        Voter storage delegate_ = voters[to];

        // Voters cannot delegate to accounts that cannot vote.
        require(delegate_.weight >= 1);

        // Since `sender` is a reference, this
        // modifies `voters[msg.sender]`.
        sender.voted = true;
        sender.delegate = to;

        if (delegate_.voted) {
            // If the delegate already voted,
            // directly add to the number of votes
            proposals[delegate_.vote].voteCount += sender.weight;
        } else {
            // If the delegate did not vote yet,
            // add to her weight.
            delegate_.weight += sender.weight;
        }
    }

    /// Give your vote (including votes delegated to you)
    /// to proposal `proposals[proposal].name`.
    function vote(uint proposal) external {
        Voter storage sender = voters[msg.sender];
        require(sender.weight != 0, "Has no right to vote");
        require(!sender.voted, "Already voted.");
        sender.voted = true;
        sender.vote = proposal;

        // If `proposal` is out of the range of the array,
        // this will throw automatically and revert all
        // changes.
        proposals[proposal].voteCount += sender.weight;
    }

    /// @dev Computes the winning proposal taking all
    /// previous votes into account.
    function winningProposal() public view
            returns (uint winningProposal_)
    {
        uint winningVoteCount = 0;
        for (uint p = 0; p < proposals.length; p++) {
            if (proposals[p].voteCount > winningVoteCount) {
                winningVoteCount = proposals[p].voteCount;
                winningProposal_ = p;
            }
        }
    }

    // Calls winningProposal() function to get the index
    // of the winner contained in the proposals array and then
    // returns the name of the winner
    function winnerName() external view
            returns (bytes32 winnerName_)
    {
        winnerName_ = proposals[winningProposal()].name;
    }
}

On line 91, while (voters[to].delegate != address(0)) {, this is ambiguous because there are two parses, one with the (0) associated with address, the other associated with voters[to].delegate != address. One thing I do notice are the alts | <assoc = right> expression assignOp expression, | expression callArgumentList, | Payable callArgumentList. These are classic mistakes--you cannot "hide" the operators behind a non-terminal! There is some discussion with a PR that fixes this problem here: https://github.com/antlr/antlr4/pull/4480

codeFather2 commented 1 month ago

These are classic mistakes--you cannot "hide" the operators behind a non-terminal!

Let's leave it like this until https://github.com/antlr/antlr4/pull/4480 will not be merged. Also i want to improve the grammar in next iterations, probably we can avoid non-left recursion in expression rule just by splitting the rule. Added this to known issues in readme

kaby76 commented 1 month ago

I'll test the grammar as is with https://github.com/antlr/grammars-v4/pull/4257 to see if, indeed, it fixes the ambiguity.

kaby76 commented 1 month ago

The Antlr PR didn't fix the ambiguity. Will try restructuring the grammar slightly.

kaby76 commented 1 month ago

The Antlr PR didn't fix the ambiguity. Will try restructuring the grammar slightly.

The fix is to simply unfold the symbol callArgumentList in the two alts it is used in expression. (I.e., paste the RHS of the callArgumentList where the symbol occurs in `expression.)

https://github.com/codeFather2/grammars-v4/blob/45afcb5b79380bb5aec4add590328a777b730685/solidity/SolidityParser.g4#L460-L461

With this change, the grammar has no ambiguity with regards to this particular issue.

$ dotnet trperf -- -c a ../examples/ex2.sol | grep -v '^0'
Time to parse: 00:00:00.0795611
1
09/24-19:26:01 ~/issues/g4-4257/solidity/Generated-before
$ cd ../Generated-after/
09/24-19:26:13 ~/issues/g4-4257/solidity/Generated-after
$ dotnet trperf -- -c a ../examples/ex2.sol | grep -v '^0'
Time to parse: 00:00:00.0830984
09/24-19:26:17 ~/issues/g4-4257/solidity/Generated-after

That said, it does not take care of all the ambiguity, which was discovered in examples/test.sol.

$ dotnet trperf -- -c ard ../examples/test.sol | grep -v '^0'
Time to parse: 00:00:00.1058447
1       typeName        71
6       expression      96
1       yulVariableDeclaration  136
09/24-19:39:29 ~/issues/g4-4257/solidity/Generated-after
kaby76 commented 1 month ago

I added an Issue in this repo to note the ambiguities in the grammar. https://github.com/antlr/grammars-v4/issues/4261

This PR looks fine for merging.

teverett commented 1 month ago

@codeFather2 thanks