ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.32k stars 5.77k forks source link

Sort order of methods in generated JSON ABI #2731

Closed MidnightLightning closed 5 years ago

MidnightLightning commented 7 years ago

I first noticed this issue on various sites like Etherscan and Remix, which use the Solidity compiler, and have been pointed here as the source of the issue.

When the JSON ABI is generated, the order of the methods/events in the JSON object seems to be arbitrary. It's not alphabetical, and it's not in the same order as the methods are presented in the contract. For a contract with many functions, that makes it a royal pain to try and find the one you care about. And for sites that just take the ABI and show the methods in order (like the Etherscan website takes the ABI and any of the constant ones it shows in ABI-order on the "Read Contract" tab (example). Scanning through that list for the one method you care about is rather tedious.

I'd suggest ordering the items in the generated JSON ABI blob in the same order they are in the contract's source code. The compiler has all the information it needs to figure out that order, and if a particular website/DApp wants to have it in alphabetical order, they can sort on their own. If it was the other way around, websites using the JSON ABI blob wouldn't be able to sort by the contract source code ordering without parsing the source code itself (which it may or may not have access to).

axic commented 7 years ago

Can you please explain what would be the benefit sorting it in a certain way? Even if that certain way does not suit everyone and thus they will sort it again?

Note, it is sorted by Remix in a certain way already.

MidnightLightning commented 7 years ago

The current ordering appears to be random and not based on the source code nor alphabetical. What is the "certain way" that Remix already uses?

Taking for example, my contract at 0x78FAEA8994EfE7FC448E743B7b342e1A96Ba3807 which I linked to originally: In the source of that contract, I laid out the methods in an order that makes sense for the contract itself:

...
getBookingCount(address _who) constant returns (uint count)
getBooking(address _who, uint _index) constant returns (uint _unicornCount, VisitType _type, uint _startBlock, uint _expiresBlock, VisitState _state, uint _completedBlock, uint _completedCount)
bookSpaVisit(uint _unicornCount) payable
bookAfternoonVisit(uint _unicornCount) payable
bookDayVisit(uint _unicornCount) payabl
bookOvernightVisit(uint _unicornCount) payable
bookWeekVisit(uint _unicornCount) payable
bookExtendedVisit(uint _unicornCount) payable
...

In this chunk of methods, those bookFooVisit functions are increasing length-of-stay order (an "Overnight" visit is longer than a "Day" visit, which is longer than an "Afternoon" visit), which makes sense for this particular contract's usage. The owner-only methods are way at the bottom to keep them out of the way, and the generic/constant methods are further up.

But currently, the generated ABI for this contract has the methods in a different order:

...
bookAfternoonVisit
addBooking
visitLength
birthBlockThreshold
owner
bookDayVisit
bookOvernightVisit
...

In this chunk of the ABI ordering, the "Afternoon", "Day", and "Overnight" methods do happen to be in the right order, but there's other methods between them. And the "Spa" method is way after this chunk, and the "Extended" one is way before it. This is the ordering in the JSON in the generated ABI in Remix, and in Etherscan. In Remix, when the compiled contract is deployed, the form for working with the contract is in a different order:

...
bookDayVisit
withdrawForeignTokens
bookAfternoonVisit
changeRepoSettings
completeBooking
changeGroveAddress
...

"Day" is now before "Afternoon", as well as other changes. Remix's interaction form puts all the constant functions first, but other than that I can't figure out the ordering logic of the rest of the form.

My point is, any contract-interacting client can alphabetically sort these methods given a list of them. However, only the contract parsing/compiling script has access to the author's original ordering, and I believe that should be maintained both for consistency across contract-interacting clients (if they do no re-sorting, they're at least the same as everyone else) and preserving the original author's intent. The current sort ordering out of the Solidity compiler appears to have no ordering and has no benefit at all in finding the method you're interested in. So, sorting it some way would at least help the people who want it that way without any other work, would provide a sensible default, and wouldn't hamper those who still want to re-sort it another way.

axic commented 7 years ago

As seen here (https://github.com/ethereum/browser-solidity/blob/master/src/app/execution/txHelper.js#L30), Remix sorts it by name and puts the constant functions first.

I think a better solutions is to add Natspec annotations to the source code and let those tools (e.g. etherscan) display that.

MidnightLightning commented 7 years ago

Aha, thanks for pointing that out! That code in Remix doesn't actually do what its intended to do. Having two separate sort calls means that the second sort is clobbering the first one. You end up with methods that definitely have constant functions first, but within the "constant" and "non-constant" groups, they're no longer in alphabetical order. For example, here's a screenshot of that contract I was using as an example in Remix:

Remix

But thanks for finding that code section; I submitted a patch to fix that to actually be what's intended there for the Remix app.

But for the actual solidity compiler, possibly adding Natspec (or other style) annotations could be a robust solution down the road, but for what's available right now, with what all authors have to develop (the order of the methods in the source code), preserving that in the compiler output I think would be a decent starting point.

casey commented 7 years ago

See https://reproducible-builds.org for why a fixed order should be implemented.

axic commented 7 years ago

@casey thanks, this is an important topic for Solidity! The order is deterministic in every version of Solidity. It may not be the same across Solidity versions, but a contract can only be verified by the same compiler anyway.

Please see the documentation about "reproducible builds: http://solidity.readthedocs.io/en/develop/miscellaneous.html#contract-metadata

axic commented 7 years ago

Also to clarify: the order in the resulting ABI JSON is determined by the order of elements in the AST.

casey commented 7 years ago

@axic Ah, sorry, I misunderstood the issue and thought that it might be different from run to run.

MidnightLightning commented 7 years ago

@axic

the order in the resulting ABI JSON is determined by the order of elements in the AST.

I don't think that's true; that's the whole issue I'm trying to make. In my example it clearly is not sorted by the order of the elements in the contract source code.

casey commented 7 years ago

Actually, wanted to add an additional reason that you might want to have a defined sorting order (for example, alphabetically) that doesn't change between versions.

If a developer is working on a smart contract and wants to verify that the ABI doesn't change unexpectedly, they might commit the ABI to revision control, so that on changes they see differences in git diffs. If the order of elements in the ABI changes when they upgrade solc, this would make that more difficult.

axic commented 7 years ago

@MidnightLightning to clarify most of them are sorted by a C++ map.

axic commented 7 years ago

There are two options: a) sort by AST order or b) sort lexicographically. Only b) properly satisifes @casey's scenario.

Though not fully convinced the ABI should be checked by diff and not using an actual comparison tool (i.e. annotations in the ABI may change with versions without actually affecting the ABI), but I admit git doesn't make it easy.

axic commented 6 years ago

I suggest that we sort it deterministically as follows: lexicographic sorting by the values of the following keys:

That means the usual order will be something like:

The sorting should be added in the ABI::generate method here

tpmccallum commented 5 years ago

I have been working on creating a deterministic ABI hash in canonical form. I posted on ethresearch and then later found your above suggestion.

Before seeing this I had started both a Python and Javascript implementation of something similar.

I also go inside the values of keys like inputs and outputs and then sort by the values of the keys in each of their internal items.

Eventually I want to get a reliable unique hash per ABI. It is interesting how Python's json.dumps() and Javascript's JSON.stringify() even produce different output formats. They don't make it easy for us do they? :)

Perhaps the output ABI (for hashing) should just be sanitized to have no tabs, spaces, newlines whatsoever. This would be more reliable than hacking around with the nuances of each language's JSON serialization libraries.

tpmccallum commented 5 years ago

Hi @axic and @jacqueswww, I have written this in Python for now (today).

Example ERC20 input - unsorted

[{"constant": true, "inputs": [], "name": "name", "outputs": [{"name": "", "type": "string"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": false, "inputs": [{"name": "_spender", "type": "address"}, {"name": "_value", "type": "uint256"}], "name": "approve", "outputs": [{"name": "", "type": "bool"}], "payable": false, "stateMutability": "nonpayable", "type": "function"}, {"constant": true, "inputs": [], "name": "totalSupply", "outputs": [{"name": "", "type": "uint256"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": false, "inputs": [{"name": "_from", "type": "address"}, {"name": "_to", "type": "address"}, {"name": "_value", "type": "uint256"}], "name": "transferFrom", "outputs": [{"name": "", "type": "bool"}], "payable": false, "stateMutability": "nonpayable", "type": "function"}, {"constant": true, "inputs": [], "name": "decimals", "outputs": [{"name": "", "type": "uint8"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": true, "inputs": [{"name": "_owner", "type": "address"}], "name": "balanceOf", "outputs": [{"name": "balance", "type": "uint256"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": true, "inputs": [], "name": "symbol", "outputs": [{"name": "", "type": "string"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": false, "inputs": [{"name": "_to", "type": "address"}, {"name": "_value", "type": "uint256"}], "name": "transfer", "outputs": [{"name": "", "type": "bool"}], "payable": false, "stateMutability": "nonpayable", "type": "function"}, {"constant": true, "inputs": [{"name": "_owner", "type": "address"}, {"name": "_spender", "type": "address"}], "name": "allowance", "outputs": [{"name": "", "type": "uint256"}], "payable": false, "stateMutability": "view", "type": "function"}, {"payable": true, "stateMutability": "payable", "type": "fallback"}, {"anonymous": false, "inputs": [{"indexed": true, "name": "owner", "type": "address"}, {"indexed": true, "name": "spender", "type": "address"}, {"indexed": false, "name": "value", "type": "uint256"}], "name": "Approval", "type": "event"}, {"anonymous": false, "inputs": [{"indexed": true, "name": "from", "type": "address"}, {"indexed": true, "name": "to", "type": "address"}, {"indexed": false, "name": "value", "type": "uint256"}], "name": "Transfer", "type": "event"}]

Example ERC20 output - sorted

[{"anonymous": false, "inputs": [{"indexed": true, "name": "owner", "type": "address"}, {"indexed": true, "name": "spender", "type": "address"}, {"indexed": false, "name": "value", "type": "uint256"}], "name": "Approval", "type": "event"}, {"anonymous": false, "inputs": [{"indexed": true, "name": "from", "type": "address"}, {"indexed": true, "name": "to", "type": "address"}, {"indexed": false, "name": "value", "type": "uint256"}], "name": "Transfer", "type": "event"}, {"payable": true, "stateMutability": "payable", "type": "fallback"}, {"constant": true, "inputs": [{"name": "_owner", "type": "address"}, {"name": "_spender", "type": "address"}], "name": "allowance", "outputs": [{"name": "", "type": "uint256"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": false, "inputs": [{"name": "_spender", "type": "address"}, {"name": "_value", "type": "uint256"}], "name": "approve", "outputs": [{"name": "", "type": "bool"}], "payable": false, "stateMutability": "nonpayable", "type": "function"}, {"constant": true, "inputs": [{"name": "_owner", "type": "address"}], "name": "balanceOf", "outputs": [{"name": "balance", "type": "uint256"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": true, "inputs": [], "name": "decimals", "outputs": [{"name": "", "type": "uint8"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": true, "inputs": [], "name": "name", "outputs": [{"name": "", "type": "string"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": true, "inputs": [], "name": "symbol", "outputs": [{"name": "", "type": "string"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": true, "inputs": [], "name": "totalSupply", "outputs": [{"name": "", "type": "uint256"}], "payable": false, "stateMutability": "view", "type": "function"}, {"constant": false, "inputs": [{"name": "_to", "type": "address"}, {"name": "_value", "type": "uint256"}], "name": "transfer", "outputs": [{"name": "", "type": "bool"}], "payable": false, "stateMutability": "nonpayable", "type": "function"}, {"constant": false, "inputs": [{"name": "_from", "type": "address"}, {"name": "_to", "type": "address"}, {"name": "_value", "type": "uint256"}], "name": "transferFrom", "outputs": [{"name": "", "type": "bool"}], "payable": false, "stateMutability": "nonpayable", "type": "function"}]

Here is a link to the code.

I ended up writing this without help from Libraries. Partly because we can now transpose the logic of my code to C++ and partly because (as it turns out) Python's lambda and Python's itemgetter do not entertain the fact that the fallback item in the ABI has no name .

tpmccallum commented 5 years ago

The overall logic is as follows

The inner workings are as follows

Multilevel comparison

    # Compare two items and return a bool
    def compareItems(self, a, b):
        try:
            #print("Comparing " + str(a['type']) + " and " + str(b['type']))
            if str(a['type']) > str(b['type']) or str(a['type']) == str(b['type']) and str(a['name']) > str(b['name']) :
                #print("Returning True")
                return True
            else:
                #print("Returning False")
                return False
        except:
            # Caters for cases where the name is not present i.e. a fallback function
            #print("Comparing " + str(a['type']) + " and " + str(b['type']))
            if str(a['type']) > str(b['type']):
                #print("Returning True")
                return True
            else:
                #print("Returning False")
                return False

Core sort logic

def sort(_json):
    for passnum in range(len(_json)-1,0,-1):
        for item in range(len(_json) - 1):
            if compareItems(_json[item], _json[item+1]) == True:
                temp = _json[item]
                _json[item] = _json[item+1]
                _json[item+1] = temp
    return _json

Driver to sort the inner inputs and outputs

def sortInternalListsInJsonObject(_json):
    for listItem in _json:
        for k, v in listItem.items():
            print("Processing " + str(v))
            # Qualify the value as a list of JSON objects
            if type(v) not in (str, bool, int):
                # Qualify list as needing sorting (contains more than one item)
                if len(v) > 1:
                    # Qualify the sortable data is JSON
                    if type(v[0]) is dict:
                        print("Processing " + str(v))
                        sort(v)
                    else:
                        print("Won't be able to parse this data, it is not JSON")
                else:
                    print("Not enough items in the list to sort, moving on")
            else:
                print(str(v) + " is not a list, moving on ...")
    return _json
chriseth commented 5 years ago

The current order is by function selector, by the way. I would propose to first sort by kind as suggested by @axic, then by name and then by function selector - it is quite hard to determine a sorting order of the input and output types.

MidnightLightning commented 5 years ago

Glad to see this is getting some attention! Given the difficulties of determining a "best" sorting order based on attributes of the function/signature, might I propose my original suggestion again: have the "canonical" sort be by the order the developer put the methods in the Solidity source code. That order is only known by the compiler (which is what we're talking about here); any other implementations that use the ABI down the road can determine their own alphabetical/function-type/etc.-based sort ordering if they so desire (none of those sort types require additional metadata about the original Solidity source to rearrange into that order). Doing it this way, the compiler would be acting in a more "unbiased" method in my opinion, and letting downstream apps add different logic to it if they so chose?

chriseth commented 5 years ago

@MidnightLightning note that this is also not really well-defined because the functions can come from different source units. We could sort by AST id, but in that case, I would actually prefer to sort by name first.

tpmccallum commented 5 years ago

I completely understand the raw/native source code ordering approach that @MidnightLightning is proposing. However, there are a few things to consider. Please see below.

Solidity Solidity supports multiple inheritance as well as polymorphism, and as such, specific functions can be found in multiple places and in multiple files. Would you agree that once a significant amount of inheritance and polymorphism is implemented, the smart contract kind of looses that linear source code layout anyway.

Solidity has hoisting; this allows functions to be written in any order whatsoever. If source code ordering was implemented, a contract could have several different valid ABIs (depending on how the smart contract developer chose to assemble the smart contract functions and/or assemble external/inherited contracts). I guess some could see this as a feature and other could see this as a concern, if considering reproducible builds.

Let's look at this from another angle.

Vyper

A Vyper contract is saved in a single file, only. The Vyper compiler does not support inheritance or polymorphism. Vyper does not have hoisting; a function definition must appear in the source code before it can be called. This deliberate design organically creates quite an ordered sequence. At present the Vyper compiler does not perform any ordering of the ABI. The layout of the ABI appears in the same order as the layout of the source code. From this perspective, source code ordering makes a lot of sense. However I can confirm that there is still potential to re-order function positions in a contract's Vyper source code (without requiring hoisting) which will create a different ABI.

So at the end of the day, as with Solidity, there can definitely still be many different (valid) ABIs for the exact same smart contract.

Ordering At present, multilevel ordering (primarily) by type, then (secondary) by name does make sense. This can also be said for the internal list of inputs and outputs given that a specific individual function will never have two inputs (or outputs) with exactly the same type and name.

Hashing Give that human readability of the ABI is desirable. Lexicographic sorting (of human readable words) can be employed throughout the entire sorting process. Whilst hashing essentially provides a unique content addressable key, sorting by hashes of already deterministic text detracts from the human readability.

However, from a reproducible builds perspective, once the overall ABI is completely sorted (and sanitized i.e. sub(r"[\n\t\s]*", "", sortedAbi), the end result (the high level ABI) will produce the desired key which could play a part in assisting with source code authenticity work.

In addition to this, a hash of the end result can be used in unit testing. For example, the hash can be used in Python, Rust, Javascript, C++ etc. to confirm the correct behavior of each of the different compiler's ABI sorting implementation. With eWASM around the corner it would be brilliant if the community could come up with an ordering design for keeps.

Hope this helps. Brilliant chatting to you all. Kind regards Tim

tpmccallum commented 5 years ago

I wrote a sorting and hashing API which you can try out here.

You can paste the raw text of any of these unsorted mixed up ERC20 ABIs, or the official ERC20 ABI from the Ethereum Wiki, and the resulting hash (for the official ERC20 ABI) will always be 0xfa9452aa0b9ba0bf6eb59facc534adeb90d977746f96b1c4ab2db01722a2adcb.

tpmccallum commented 5 years ago

Each individual input takes a position in the individual ABI item depending on 1) its type and 2) its name. For example addresses come before uint256 Each individual output takes a position in the individual ABI item depending on 1) its type and 2) its name. For example addresses come before uint256 Each of the individual ABI items (including the inputs and outputs) are sorted by their keys i.e. inputs come before outputs and stateMutability comes after those. Each individual ABI item (which consists of everything we have talked about above) takes a position in the overall single ABI file depending on 1) its type and 2) its name. For example all events come before functions and out of those an event called alpha would come before an event called bravo. Here is an example of the sorted ERC20 ABI.

[{
    "anonymous": false,
    "inputs": [{
        "indexed": true,
        "name": "owner",
        "type": "address"
    }, {
        "indexed": true,
        "name": "spender",
        "type": "address"
    }, {
        "indexed": false,
        "name": "value",
        "type": "uint256"
    }],
    "name": "Approval",
    "type": "event"
}, {
    "anonymous": false,
    "inputs": [{
        "indexed": true,
        "name": "from",
        "type": "address"
    }, {
        "indexed": true,
        "name": "to",
        "type": "address"
    }, {
        "indexed": false,
        "name": "value",
        "type": "uint256"
    }],
    "name": "Transfer",
    "type": "event"
}, {
    "payable": true,
    "stateMutability": "payable",
    "type": "fallback"
}, {
    "constant": true,
    "inputs": [{
        "name": "_owner",
        "type": "address"
    }, {
        "name": "_spender",
        "type": "address"
    }],
    "name": "allowance",
    "outputs": [{
        "name": "",
        "type": "uint256"
    }],
    "payable": false,
    "stateMutability": "view",
    "type": "function"
}, {
    "constant": false,
    "inputs": [{
        "name": "_spender",
        "type": "address"
    }, {
        "name": "_value",
        "type": "uint256"
    }],
    "name": "approve",
    "outputs": [{
        "name": "",
        "type": "bool"
    }],
    "payable": false,
    "stateMutability": "nonpayable",
    "type": "function"
}, {
    "constant": true,
    "inputs": [{
        "name": "_owner",
        "type": "address"
    }],
    "name": "balanceOf",
    "outputs": [{
        "name": "balance",
        "type": "uint256"
    }],
    "payable": false,
    "stateMutability": "view",
    "type": "function"
}, {
    "constant": true,
    "inputs": [],
    "name": "decimals",
    "outputs": [{
        "name": "",
        "type": "uint8"
    }],
    "payable": false,
    "stateMutability": "view",
    "type": "function"
}, {
    "constant": true,
    "inputs": [],
    "name": "name",
    "outputs": [{
        "name": "",
        "type": "string"
    }],
    "payable": false,
    "stateMutability": "view",
    "type": "function"
}, {
    "constant": true,
    "inputs": [],
    "name": "symbol",
    "outputs": [{
        "name": "",
        "type": "string"
    }],
    "payable": false,
    "stateMutability": "view",
    "type": "function"
}, {
    "constant": true,
    "inputs": [],
    "name": "totalSupply",
    "outputs": [{
        "name": "",
        "type": "uint256"
    }],
    "payable": false,
    "stateMutability": "view",
    "type": "function"
}, {
    "constant": false,
    "inputs": [{
        "name": "_to",
        "type": "address"
    }, {
        "name": "_value",
        "type": "uint256"
    }],
    "name": "transfer",
    "outputs": [{
        "name": "",
        "type": "bool"
    }],
    "payable": false,
    "stateMutability": "nonpayable",
    "type": "function"
}, {
    "constant": false,
    "inputs": [{
        "name": "_from",
        "type": "address"
    }, {
        "name": "_to",
        "type": "address"
    }, {
        "name": "_value",
        "type": "uint256"
    }],
    "name": "transferFrom",
    "outputs": [{
        "name": "",
        "type": "bool"
    }],
    "payable": false,
    "stateMutability": "nonpayable",
    "type": "function"
}]
tpmccallum commented 5 years ago

At the very end of the above process. We remove all tabs, spaces, new lines etc. We then Sha3 and get 0xfa9452aa0b9ba0bf6eb59facc534adeb90d977746f96b1c4ab2db01722a2adcb

tpmccallum commented 5 years ago

Please try it out https://etc.search.secondstate.io/shaAnAbi.html