Alternative structure and behavior for contract definitions

tcoulter commented 7 years ago

Overview

I think #38 doesn't provide much utility and may contribute many edge cases. Similarly, I think the contracts keyword is getting overloaded and needs refactoring.

This issue represents a response to #38 and more recent PRs, and is an attempt to open a discussion about restructuring some of the lockfile's data.

Background

It was hard for me to reason about this, so I want to start by introducing some definitions that help describe my thinking.

contract: A contract is source code that describes an object that can be deployed to a blockchain.
instance: An instance is a deployed version of that contract with an address on a specific chain.
type: All instances have a type; a type is the specific contract used during deployment to create that specific instance. For simplicity, let's assume that if an instance is said to have type A, then the source code used to deploy it was defined by a contract called A, using the code contract A {}.
contract artifacts: Results of compilation and deployment, including abi, binary, deployed address, etc.

Given the structure proposed in #38, using all the names available to us would look like the JSON blob below. I've defined my understanding of these names and given each name an identifier.

ContractName|Alias: "contract name or alias". When ContractName, this will equal the type name. When Alias, this will equal an arbitrary name we want to call this contract.
TypeName: "type name", same as ContractName. Not arbitrary (i.e., not Alias or InstanceName).
InstanceName: "instance name", an arbitrary name we can give to a specific instance.

{
  // ...
  contracts: {
    "ContractName|Alias": {
      // ...
      "contract_name": "TypeName"
      "deployments": {
        "blockchain://...": {
           "InstanceName": {...}
        }
      }
    }    
  }
}

Some assumptions I'm making:

TypeName is intended to be used to reference a specific contract within the process of code verification.
A dependency may declare a contract that has the same TypeName as a contract within the root package, or two dependencies may declare contracts with the same TypeName.
InstanceNames are required to be unique per contract/alias per blockchain, but will not necessarily be unique across contract objects.
The contracts item in the lockfile exists for two (and only two) reasons:
- To describe contracts that are deployed as part of this package, including their associated artifacts.
- To describe contract types and their associated artifacts when the package does not include source code.

Given this structure and my assumptions, here's some observations:

ContractName|Alias will likely (should? must? will always?) equal TypeName in all cases except when there are multiple contracts in a package with the same TypeName. If multiple dependencies declare a contract with the same TypeName, it's up to the package manager / development environment to reconcile the two.
TypeName won't help in distinguishing between two different contracts during code verification if they have the same contract name, even if their Alias != TypeName.
InstanceName supersedes Alias in utility in the case where we need to interact with a specific instance of a contract. In that case, we're better off using the InstanceName to reference the instance, though keeping the TypeName intact for "kind of" expressions (i.e., "is instance with InstanceName kind of TypeName?").
In cases where there are no instances declared and no conflicting names, ContractName is used instead of Alias, which is equivalent to TypeName.

My impressions

I think having three names is confusing, and doesn't leave indication to package managers and dev environment creators how these artifacts should be handled. Additionally, I think the contracts object is trying to do too much.

What I think we should do

I think we should (could) do the following things, all centered around leveraging dependencies to describe their own code.

Find a way to describe contract types in a way that is not ambiguous. Perhaps matching how link dependencies are defined in #37 via a pointer, but without the instance name since types represent contracts and not specific instances.
Separate out contract artifacts from deployments, as they serve different descriptive purposes and uses, and do so in a way that lets us point to contract types from other packages when referencing deployments. In the example below, contract_types can only be used to describe contracts that are defined by this package only. Contracts that are external to this package (i.e., dependencies) can be referenced via a pointer in the deployments section.
```
{
  contract_types: {
    "LocalTypeName": { // e.g., Token (not a pointer; see below)
      "abi": [...],
      "bytecode": "...",
      "runtime_bytecode": "...",
      "natspec": {...},
      "compiler": {...}
    }
  }, 
  deployments: {
    "blockchain://...": {
      "InstanceName": { // e.g., GoldToken
        type: "TypeNamePointer" // e.g., #Token or ipfs://Qm...#OtherTokenLib
      }
    }
  }
}
```
Note that this structure for deployments reduces naming conflicts per blockchain across contract types.
Make contract_types required only if there are active local deployments -- i.e., deployments that are a result of new code within this package, and not a deployment of a specific dependency (i.e., "external deployment"). See the following truth table. Y means that we would need the contract_types data, N means we wouldn't.
```
                                | Source exists | Source !exists |
Local deployments exist         |       Y       |       Y        |
Only External deployments exist |       N       |       N        |
No deployments exist            |       N       |       N(!)     |
```
Reasoning:
- Row 1, Col 1: If there are local deployments and source code: We need the contract_types object to verify the deployments against our compilation of the source code -- specifically the compiler data held within the contract object. Note that we don't necessarily need the rest of the information as we can simply compile the source code ourselves and generate that data (i.e., we can generate the ABI to interact with the contract and the binary to deploy it).
- Row 1, Col 2: If there are local deployments but no source: We'd need the contract_types object to interact with the deployed contracts. Verification would be impossible here, but that's the package maintainer's choice.
- All of Row 2: If there are only external deployments, our package's source code wouldn't matter as that information would be described in the dependency. It wouldn't be this package's job to describe the contract deployed.
- Row 3, Col 1: If there are no deployments at all but there is source: It's simply a source package, so none of the contract_types information matters.
- Row 3, Col 2: If there are no deployments and no sources: Why does this package exist?

Conclusion

As far as I understand, this structure and behavior should cover all edge cases where we might have naming conflicts, and it should provide clearer implications on what can (and should) be done with the information required. Thoughts welcome.

pipermerriam commented 7 years ago

This is an excellent writeup and I think it does a way better job of describing what the current spec doesn't do very well. I sort of set today aside to focus on the spec so I'd be happy to codify this into a pull request.

tcoulter commented 7 years ago

That'd be awesome @pipermerriam.

I want to bring in your question from gitter as well as my answer:

@tcoulter If I understand the proposal in ethpm/epm-spec#40 correctly, this would disallow the source code for a project containing two contracts with the same name (as defined by the name of the contract in the source code). Is that correct? As in, the set of all names used for contracts/libraries in a package’s local source code may not contain duplicates.

I thought about that case. You’re right in that it would. If this is an improper restriction, we can beef up local typename pointers to somehow include the disambiguation. That said, if you were to send all the source of a package to the solidity compiler, would you be able to disambiguate the two from its output?

tcoulter commented 7 years ago

I am closing this in favor of #41 (thank github that this will stay around for descriptive purposes).

ethpm / ethpm-spec