Validate bytecode against another bytecode

davidyuk commented 2 years ago

Currently, /validate-byte-code compares bytecode with source code. We may want to compare on-chain bytecode vs bytecode provided by developer. Should it be implemented by compiler or maybe we can compare bytecode as strings? We don't care about specific reports like "Functions in the byte code but not in the source code: ..." https://github.com/aeternity/aesophia/blob/a982f25262763e8f5d0c12014bfbb2c1672dc681/src/aeso_compiler.erl#L639.

marc0olo commented 2 years ago

I don't think https://github.com/aeternity/aepp-sdk-js/issues/1322 depends on this here.

but what you are asking for is interesting. so you wanna be able to compare the functionality of contracts that have been compiled with different compiler versions to check if they provide the same functionality?

do I understand this correctly?

what exactly is the usecase for that? I need to understand :-)

marc0olo commented 2 years ago

We may want to compare on-chain bytecode vs bytecode provided by developer

in what usecase would I want to do this?

davidyuk commented 2 years ago

I don't think aeternity/aepp-sdk-js#1322 depends on this here.

Currently, I'm proposing to compare bytecodes as strings (as a part of that task, because we have the same feature for instances on source code), would be nice to have a confirmation that this is ok approach.

so you wanna be able to compare the functionality of contracts that have been compiled with different compiler versions to check if they provide the same functionality?

Yes, this would be nice, it is discussed in #75

what exactly is the usecase for that?

An aepp that needs to validate contracts deployed by other parties won't receive false-negative in case if a contract was compiled by a different version of the compiler. Actually, I'm not sure how this works and if it is a case at all.

in what usecase would I want to do this?

I think usecases are the same as validation of source code vs bytecode. For example, we have a voting app, polls created by users as separate smart contracts, before making a vote the voting app may want to ensure that on-chain bytecode corresponds to the local one to ensure that results would be counted properly.

nikita-fuchs commented 2 years ago

I think usecases are the same as validation of source code vs bytecode. For example, we have a voting app, polls created by users as separate smart contracts, before making a vote the voting app may want to ensure that on-chain bytecode corresponds to the local one to ensure that results would be counted properly.

👆

marc0olo commented 2 years ago

I think usecases are the same as validation of source code vs bytecode. For example, we have a voting app, polls created by users as separate smart contracts, before making a vote the voting app may want to ensure that on-chain bytecode corresponds to the local one to ensure that results would be counted properly.

I agree on that one but for the rest I am not sure how to achieve this or how to deal with that. I mean if I as developer have a certain bytecode that I want to check against an on-chain contract that I know it should have the same bytecode then this should work and if the node currently still strips the init function (https://github.com/aeternity/aeternity/issues/3510) then this should be fixed or changed on the node.

but I don't see a good and convenient way how to compare bytecode of different compilers and check if the sourcecode behind is the same. do we really need to provide this in the SDK?

@thepiwo I think you had to deal with this topic many times. what's your opinion here?

nikita-fuchs commented 2 years ago

No need to worry or reinvent the wheel, just check how etherscan is doing it 😌

having it in the SDK would be super sleek, as currently, the "contract at address" feature is a total trust thing. You have 0 clue of what you're actually talking to there. Then we could allow providing a strict option with a compiler value, meaning Only accept contract at address ct_xyz if my contract matches with the bytecode.

regarding the init's role @hanssv can probably confirm that its bytecode is stored, too, but the init's params are not part of the contract's bytecode? Which would make verifying contracts on ae easier than on eth.

marc0olo commented 2 years ago

No need to worry or reinvent the wheel, just check how etherscan is doing it 😌

sure but this has nothing to do with this. I mean we could have a separate service that compiles the source code with different compiler versions which allows to verify sourecode somehow. here we are talking about integrating such feature directly into the SDK. I am not sure if this really makes sense.

having it in the SDK would be super sleek, as currently, the "contract at address" feature is a total trust thing. You have 0 clue of what you're actually talking to there. Then we could allow providing a strict option with a compiler value, meaning Only accept contract at address ct_xyz if my contract matches with the bytecode.

again, I don't think SDK is the right place to handle this IF we don't have the source code AND/OR don't know the compiler version it has been compiled with. you should have other services to provide that info or a dedicated solution to verify bytecode by providing respective sourcecode.

as we are always comparing with Ethereum here => is there any SDK that provides this functionality out of the box? can you share that info with me? would be good to compare or check how they do it.

regarding the init's role @hanssv can probably confirm that its bytecode is stored, too, but the init's params are not part of the contract's bytecode? Which would make verifying contracts on ae easier than on eth.

I don't understand what you are talking about here. of course bytecode is stored on the chain. but in the past the init function was stripped from bytecode to save storage when I remember correctly. and for this reason it was impossible to compare bytecode in the past. meanwhile this should be possible. but in order to compare you need to know which compiler version was used to generate the bytecode.

davidyuk commented 2 years ago

if the node currently still strips the init function

As I understand it is not the case now, but at that moment bytecode validation was broken because the malicious contract's init function was able to set the state unreachable by the init function of the proper contract.

but I don't see a good and convenient way how to compare bytecode of different compilers and check if the sourcecode behind is the same. do we really need to provide this in the SDK?

Ideally, it shouldn't require any changes on sdk side. I would solve it by preserving the compiler version in bytecode, then the compiler will be able to choose the proper version for validation without iteration over available compilers 🙃

Then we could allow providing a strict option with a compiler value

It is called validateByteCode: https://github.com/aeternity/aepp-sdk-js/blob/48d36f9be30805a476590282bffae9944134eb41/src/contract/aci/index.js#L65

marc0olo commented 2 years ago

As I understand it is not the case now, but at that moment bytecode validation was broken because the malicious contract's init function was able to set the state unreachable by the init function of the proper contract.

can you elaborate a bit more on that? I think I didn't really understand it. maybe some example?

Ideally, it shouldn't require any changes on sdk side. I would solve it by preserving the compiler version in bytecode, then the compiler will be able to choose the proper version for validation without iteration over available compilers 🙃

I agree it would be cool to have such a service. the question is what's the best way to do this. as far as I understood this probably won't be covered by the http compiler. @hanssv what's your opinion about that? do you have any suggestion/idea how to deal with this?

It is called validateByteCode: https://github.com/aeternity/aepp-sdk-js/blob/48d36f9be30805a476590282bffae9944134eb41/src/contract/aci/index.js#L65

if source code is provided I definitely second this. I mean as you mentioned we already have this. I am just not sure how we should deal with bytecode verification in case the SDK cannot provide the sourcecode (which this issue aims to address, right?). wouldn't that require decompiling it and check bytecode generation with other compiler versions?

hanssv commented 2 years ago

The node does not strip the init function since the last (IRIS) protocol upgrade.

Also unless the user does explicitly remove it the compiler will include the compiler version and the hash of the sourcecode in the contract information on-chain (this is the contract create in https://www.aeknow.org/block/transaction/th_2Asgy4DisLGSar63kmxdaJjQhw7utsWR4nAWy167Zv8yuoDWs4) :

(aeternity_ct@localhost)6> aect_sophia:deserialize(SerBin).
#{byte_code =>
      <<185,2,230,254,8,59,226,96,0,55,1,7,55,0,2,3,17,240,234,
        120,157,38,0,7,12,6,251,3,...>>,
  compiler_version => <<"6.1.0">>,contract_vsn => 3,
  payable => false,
  source_hash =>
      <<39,147,224,0,198,24,136,107,35,174,172,69,160,206,181,
        148,51,48,73,63,88,250,86,187,40,45,166,140,...>>,
  type_info => []}

hanssv commented 2 years ago

So this is how it was supposed to work, it was described to me quite some time ago (I think it was a late night back in Rome when we published mainnet so memory is a bit foggy):

When you create a contract - that you want to be verifiable and trusted - you include the hash of the source-code, and the compiler version on-chain. You also publish the source code somewhere (like Etherscan I guess?!). Now the one that wants to trust the contract can grab the source code and produce a hash and compare it to the hash on-chain. He/She can also grab the correct compiler, and compile said source code and compare the byte code with what is on-chain...

Remains to be standardized what you include in the source code that is hashed I guess - currently I guess it is only the main contract file...

marc0olo commented 2 years ago

Thanks for your input Hans! So I guess that everything we want is doable but just needs to be defined. I personally don't think the http compiler should take care of a version switch.

@davidyuk are we able to fetch the info Hans mentioned? (bytecode, compiler_version and source_hash)

If so I think the most straightforward way is to have validateByteCode executing following steps:

get the info mentioned above from the node
check compiler version and call the correct http compiler (we need to be able to configure different compiler versions in the SDK)
- maybe we can here introduce another service that takes compiler_version and source_code as input, proxies to the correct compiler and returns the bytecode for the correct compiler (as I understand you want this functionality to be covered by the http compiler itself, right?)
compare bytecode

and this can only be done if the developer knows the source code of course. I am still struggling to understand what this specific issue is about. for me it's kind of strange to validate bytecode against bytecode not knowing the respective sourcecode. what we are talking about here is how to deal with #75, right? the discussion somehow moved away from the issue's title 😅

Remains to be standardized what you include in the source code that is hashed I guess - currently I guess it is only the main contract file...

this sounds like sth. that should be defined in an AEX proposal (we should aim to revive this again for such kind of things)

nikita-fuchs commented 2 years ago

I am still struggling to understand what this specific issue is about. for me it's kind of strange to validate bytecode against bytecode not knowing the respective sourcecode

no worries, at no point I think is just some bytecode compared to some other one. It's always some bytecode resulting out of some contract that one has, which is compared to the onchain-bytecode.

Again, please take a look at how etherscan is doing it and apply accordingly to ae :)

marc0olo commented 2 years ago

Again, please take a look at how etherscan is doing it and apply accordingly to ae :)

this is independent of the SDK functionality in that regards. I think we need to put this on hold for now and clarify if the other (more important) stuff is finished.

thepiwo commented 2 years ago

I think usecases are the same as validation of source code vs bytecode. For example, we have a voting app, polls created by users as separate smart contracts, before making a vote the voting app may want to ensure that on-chain bytecode corresponds to the local one to ensure that results would be counted properly.

this is already being done by the frontend, it will warn you if the bytecode of the local contract is not matching the deployed one.

thepiwo commented 2 years ago

As I understand it is not the case now, but at that moment bytecode validation was broken because the malicious contract's init function was able to set the state unreachable by the init function of the proper contract.

of course this still applies, the current comparison only works with bytecode after init has been stripped. I can't think of big attack vectors as init has already been called. If the aepp/user of contract doesn't like the state existing it can still be warned for case-by-case.

thepiwo commented 2 years ago

Also unless the user does explicitly remove it the compiler will include the compiler version and the hash of the sourcecode in the contract information on-chain

awesome, I didn't know. That will help a lot

thepiwo commented 2 years ago

To check this in the sdk we need to rely on mdw currently to get the contract create tx I think, then the mdw can also extract those information, or even provide an endpoint to check the correctness of source when a hash or source is provided

aeternity / aesophia_http

Validate bytecode against another bytecode #81