ethpm / ethpm-spec

Ethereum Package Manager http://ethpm.github.io/ethpm-spec/
165 stars 30 forks source link

Alternative structure and behavior for contract definitions #40

Closed tcoulter closed 7 years ago

tcoulter commented 7 years ago

Overview

I think #38 doesn't provide much utility and may contribute many edge cases. Similarly, I think the contracts keyword is getting overloaded and needs refactoring.

This issue represents a response to #38 and more recent PRs, and is an attempt to open a discussion about restructuring some of the lockfile's data.

Background

It was hard for me to reason about this, so I want to start by introducing some definitions that help describe my thinking.

Given the structure proposed in #38, using all the names available to us would look like the JSON blob below. I've defined my understanding of these names and given each name an identifier.

{
  // ...
  contracts: {
    "ContractName|Alias": {
      // ...
      "contract_name": "TypeName"
      "deployments": {
        "blockchain://...": {
           "InstanceName": {...}
        }
      }
    }    
  }
}

Some assumptions I'm making:

Given this structure and my assumptions, here's some observations:

  1. ContractName|Alias will likely (should? must? will always?) equal TypeName in all cases except when there are multiple contracts in a package with the same TypeName. If multiple dependencies declare a contract with the same TypeName, it's up to the package manager / development environment to reconcile the two.
  2. TypeName won't help in distinguishing between two different contracts during code verification if they have the same contract name, even if their Alias != TypeName.
  3. InstanceName supersedes Alias in utility in the case where we need to interact with a specific instance of a contract. In that case, we're better off using the InstanceName to reference the instance, though keeping the TypeName intact for "kind of" expressions (i.e., "is instance with InstanceName kind of TypeName?").
  4. In cases where there are no instances declared and no conflicting names, ContractName is used instead of Alias, which is equivalent to TypeName.

My impressions

I think having three names is confusing, and doesn't leave indication to package managers and dev environment creators how these artifacts should be handled. Additionally, I think the contracts object is trying to do too much.

What I think we should do

I think we should (could) do the following things, all centered around leveraging dependencies to describe their own code.

  1. Find a way to describe contract types in a way that is not ambiguous. Perhaps matching how link dependencies are defined in #37 via a pointer, but without the instance name since types represent contracts and not specific instances.

  2. Separate out contract artifacts from deployments, as they serve different descriptive purposes and uses, and do so in a way that lets us point to contract types from other packages when referencing deployments. In the example below, contract_types can only be used to describe contracts that are defined by this package only. Contracts that are external to this package (i.e., dependencies) can be referenced via a pointer in the deployments section.

    {
      contract_types: {
        "LocalTypeName": { // e.g., Token (not a pointer; see below)
          "abi": [...],
          "bytecode": "...",
          "runtime_bytecode": "...",
          "natspec": {...},
          "compiler": {...}
        }
      }, 
      deployments: {
        "blockchain://...": {
          "InstanceName": { // e.g., GoldToken
            type: "TypeNamePointer" // e.g., #Token or ipfs://Qm...#OtherTokenLib
          }
        }
      }
    }

    Note that this structure for deployments reduces naming conflicts per blockchain across contract types.

  3. Make contract_types required only if there are active local deployments -- i.e., deployments that are a result of new code within this package, and not a deployment of a specific dependency (i.e., "external deployment"). See the following truth table. Y means that we would need the contract_types data, N means we wouldn't.

                                    | Source exists | Source !exists |
    Local deployments exist         |       Y       |       Y        |
    Only External deployments exist |       N       |       N        |
    No deployments exist            |       N       |       N(!)     |

    Reasoning:

    • Row 1, Col 1: If there are local deployments and source code: We need the contract_types object to verify the deployments against our compilation of the source code -- specifically the compiler data held within the contract object. Note that we don't necessarily need the rest of the information as we can simply compile the source code ourselves and generate that data (i.e., we can generate the ABI to interact with the contract and the binary to deploy it).
    • Row 1, Col 2: If there are local deployments but no source: We'd need the contract_types object to interact with the deployed contracts. Verification would be impossible here, but that's the package maintainer's choice.
    • All of Row 2: If there are only external deployments, our package's source code wouldn't matter as that information would be described in the dependency. It wouldn't be this package's job to describe the contract deployed.
    • Row 3, Col 1: If there are no deployments at all but there is source: It's simply a source package, so none of the contract_types information matters.
    • Row 3, Col 2: If there are no deployments and no sources: Why does this package exist?

Conclusion

As far as I understand, this structure and behavior should cover all edge cases where we might have naming conflicts, and it should provide clearer implications on what can (and should) be done with the information required. Thoughts welcome.

pipermerriam commented 7 years ago

This is an excellent writeup and I think it does a way better job of describing what the current spec doesn't do very well. I sort of set today aside to focus on the spec so I'd be happy to codify this into a pull request.

tcoulter commented 7 years ago

That'd be awesome @pipermerriam.

I want to bring in your question from gitter as well as my answer:

@tcoulter If I understand the proposal in ethpm/epm-spec#40 correctly, this would disallow the source code for a project containing two contracts with the same name (as defined by the name of the contract in the source code). Is that correct? As in, the set of all names used for contracts/libraries in a package’s local source code may not contain duplicates.

I thought about that case. You’re right in that it would. If this is an improper restriction, we can beef up local typename pointers to somehow include the disambiguation. That said, if you were to send all the source of a package to the solidity compiler, would you be able to disambiguate the two from its output?

tcoulter commented 7 years ago

I am closing this in favor of #41 (thank github that this will stay around for descriptive purposes).