ethpm / ethpm-spec

Ethereum Package Manager http://ethpm.github.io/ethpm-spec/
165 stars 30 forks source link

Define JSON strict serialization format so that lockfiles with the *same* content always have the same hash. #77

Closed pipermerriam closed 6 years ago

pipermerriam commented 6 years ago

Currently the spec allows for any serialization of the JSON.

{
  "key-a": "value-a",
  "key-b": 2
}

While all three of these have identical contents, they would not hash to the same value.

I propose we specify that lockfiles must be serialized in the tightly packed format above meaning no extra whitespace or newlines.

carver commented 6 years ago

Maybe this should be a normalization step required just prior to generating a standardized hash, rather than a requirement for serializing the json to disk at any time? Manually inspecting the files with whitespace could be convenient, at times.

Does the spec have anything to say about the ordering of keys? I didn't find anything in my quick scan of the doc.

pipermerriam commented 6 years ago

Does the spec have anything to say about the ordering of keys? I didn't find anything in my quick scan of the doc.

No but I'd like to specify that they should be sorted and that it is invalid if any key is duplicated.

Maybe this should be a normalization step required just prior to generating a standardized hash, rather than a requirement for serializing the json to disk at any time? Manually inspecting the files with whitespace could be convenient, at times.

I think that rather than this being a normalization step, that tooling could denormalize package files when they are rendered for display.

ryan-rowland commented 6 years ago

I think that rather than this being a normalization step, that tooling could denormalize package files when they are rendered for display.

I disagree here -- Tooling should not be required to view / modify the contents of a file.

Maybe this should be a normalization step required just prior to generating a standardized hash, rather than a requirement for serializing the json to disk at any time?

I agree with @carver. It's not an unreasonable ask of a package manager to perform the steps you've listed in your PR (#80) as a pre-serialization step automatically. This removes burden from the package creator and assures consistency at the last point of failure: the serialization.

For consistency and familiarity, I think it would be beneficial for us to stick as close as is reasonable to npm's package.json.

carver commented 6 years ago

It's not an unreasonable ask of a package manager to perform the steps you've listed in your PR (#80) as a pre-serialization step automatically. This removes burden from the package creator and assures consistency at the last point of failure: the serialization.

I think this is actually agreeing with Piper.

I was originally suggesting that the spec could be loose about the format of the file, and require that the file be packed/ordered just before taking a hash. He convinced me that strictly specifying the file format is the better option here.

This works fine with a package manager like you describe. Say the tool wants to give people the option of manually editing part or all of the lockfile. In that case, it could create an editable "lockfile source" with whitespace and non-ordered keys that is not a valid final lockfile. The package manager's job would be to normalize it before creating the lockfile.