ethpm / ethpm-spec

Ethereum Package Manager http://ethpm.github.io/ethpm-spec/
165 stars 30 forks source link

Overhaul lockfile specification contract-type/contract-instance #74

Closed cbdotguru closed 6 years ago

cbdotguru commented 6 years ago

I'd like to take a look at how we're defining these portions of the lockfile.

Problem

There seems to be a lot of replication over the whole system that can be avoided. Specifically:

contract-type

Contract Name: Necessary
Bytecode: Necessary .
Runtime Bytecode: Not necessary. Retrieved from instance address.
ABI: Necessary .
Natspec: Don't inlcude. The natspec parser itself needs to be overhauled for reasons outside of this scope and also derived from source.
Compiler: Combine in instance object. Link Dependencies: Necessary as part of contract-type.

contract-instance

Contract Type: Redundant.
Address: Necessary.
Transaction: Necessary.
Block: Necessary.
Runtime Bytecode: Not necessary. Is derived from address.
Compiler: Necessary.

Solution

I propose we combine these two fields into a single contract-definition field as follows:

contract-definition

Contract Name: Provides the name of this contract.
Bytecode: Unlinked bytecode with offsets for dependencies put in link dependencies object.
ABI: For non-source packages.
Link Dependencies: Should provide either on-chain address or uri if not deployed for whatever reason. If an on-chain address is provided, the on-chain index contract should provide necessary information to auto-build the lockfile for this dependency.

If deployed, Compiler
Address
Transaction
Block
Source Code Hash: See #67, that issue should be resolved prior to determining this.

Reason

Package files generally provide all necessary information that would enable a tool to search the ecosystem, grab the resources, and build the package. Including too much information makes the spec too heavy and replicates unnecessarily.

VoR0220 commented 6 years ago

Contract definition might be cleaner in terms of the spec, but the problem is you have to be careful for accounting for instances of contracts on multiple chains, and that's part of why this split exists. Do you have a better idea for solving this issue?

tcoulter commented 6 years ago

This is a drive by - on a phone and will follow up more later. But much of what you're writing here assumes source code is present. The current spec allows for packages that only specify a deployed contract on a blockchain, without the need to attach source files. In which case bytecode (and everything derived from it) is important on its own.

On Sat, Nov 11, 2017, 11:27 AM RJ Catalano notifications@github.com wrote:

Contract definition might be cleaner in terms of the spec, but the problem is you have to be careful for accounting for instances of contracts on multiple chains, and that's part of why this split exists. Do you have a better idea for solving this issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ethpm/ethpm-spec/issues/74#issuecomment-343688267, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFp1WdyiNCifK_89wsE5E23C4Z5ujqXks5s1fUrgaJpZM4Qajl- .

cbdotguru commented 6 years ago

@VoR0220 That’s a good point. Since the chain uri is defined here, I guess we could allow for multiple instances defined within a network key much like we have inside of a truffle articfact except replace network id with the chain uri itself

@tcoulter Yes, I did make that assumption. I felt that assumption was necessary because we can’t really develop on top of a package without source. I’m not sure we should encourage any development process that assumes you’ll just test on a live testnet, we should always develop on a local net like with testrpc. If we can assume development always occurs on a local network, then it would be safe to assume the source will always have a uri in the contract definition. So then it comes down to if we want to force that assumption and my vote would be yes in that case. Not sure it is the most correct but it seems so to me.

cbdotguru commented 6 years ago

One more thing to add is it COULD make sense to store the bytecode and abi within a lockfile and the tool could autodeploy the contract by pulling the bytecode as the data sent to the local test net. With that said, just looking at packages such as with npm, which is the king of open source package models, source code is generally found and assumed to be in the package and the package file itself is light. This leads me back to the same conclusion that we could go the route that forces original source location to be defined.

Edit: Also, this is a big assumption that we're looking at a change reference the source code hash mentioned above and linked to in #67 . If this is looked at seriously, then source code would also be required. I feel this is an important change to look at first b/c it has wider implications.

pipermerriam commented 6 years ago

@Hackdom we were intentional to support packages which do not contain source code. The reason for this is to support use cases beyond open source and public blockchains. In a private blockchain it is entirely reasonable for a company to use source-less packages with pre-compiled assets because they are working within a trusted environment.

Note that these also have a valid use case for public chains as well. Think about this:

In populus, I will likely keep a cache around somewhere that maps source_hash -> compiled_bytecode_hash. Any package which ships pre-compiled bytecode will still have that bytecode verified, but once I've performed that verification once, I don't have to do it again since I can just see that it's present in my cache.

cbdotguru commented 6 years ago

@pipermerriam ah, ok that makes sense. So should the on-chain index contract contain these hashes to test against then? Also, we should probably nail down source_hash specifically to what’s included in the hash, ie are we tightly packing the source? Hashing 0x prefixed bytecode? Unlinked bytecode? And if so how can we future proof hashing with the unlinked spaces?

cbdotguru commented 6 years ago

One more thing, just to put back out there for attention. When it comes to ‘signing’ any code I think the only relevant signature is from a reputable auditor’s on-chain address. It may be good to link a signed tx to a specific source/bytecode hash. This would give us the exact code they signed off on.

Let me tumble through my brain and see if I want to amend this at all. Is there anything in the initial post that is worth continuing here or no, just close it unless I think of something else?

pipermerriam commented 6 years ago

@Hackdom the code hash piece is not something I'm proposing we put into the spec but rather a tooling specific caching mechanism. I only mentioned it to highlight the use case of including compilation assets (ABI/bytecode/etc) in lockfiles even when they are going to be independently verified.

pipermerriam commented 6 years ago

RE: signing

This feels like a different issue that either belongs under #67 or a new issue if it's a different kind of signing that you're planning to use.

cbdotguru commented 6 years ago

Ok, I’ll continue that under #67 then. So one more thing regarding this @pipermerriam. The bytecode and runtime bytecode. Is this in reference to linked vs unlinked? If so, should runtime bytecode be included in both the contract-type and contract-instance? I updated the proposal up top. If my assumptions are correct, runtime bytecode can be and should be derived from on-chain directly. I feel tools should be expected to obtain on-chain information from the networks they support.

cbdotguru commented 6 years ago

@pipermerriam one question before closing this. If a library contains internal functions, does the package need source code to properly link the dependent contract when compiling?

pipermerriam commented 6 years ago

package needs source code to verify the on-chain contract is the library in question. This would be considered the verification step which is technically independent of linking. Most tooling will implement verification as a part of their linking process but it's not required since you can link code without actually verifying what you are linking to.

Does that answer your question?

cbdotguru commented 6 years ago

Yes it answers that part.