git-consensus / contracts

Converts the informal ownership structure of an open-source git project to a formal DAO, with token distribution mechanisms for contributors.
GNU General Public License v3.0
13 stars 0 forks source link

📜 Proposal: Git Consensus Interface V2 #53

Open mattstam opened 1 year ago

mattstam commented 1 year ago

Overview

IGitConsensus is too specific to use an EIP. Implementations should be capable of handling all forms of Git objects (not just commits and tags like currently).

Additionally, common ownership problems that occur in open-source are not just related to the global distribution of the project (overall token ownership %), but also the specific file and directory ownership in repositories.

Example - Pull Request Approvals

When a Pull Request is created that changes a file, it is desirable to require approval from owners before merging. In platforms such as Github, you can require Pull Requests to need approval from people with write/admin permissions.

With Git Consensus currently, you could require Pull Request merges to need a certain % of project token ownership. However, just because a person or group of people have a high % of ownership in the project, does not mean that they are knowledgeable about the particular file(s) being changed.

To resolve this issue, IGitConsensus needs to be augmented to handle file and directory ownership.

Goals

Proposal

Not can we assign an owner to a commit or annotated tag, but also a blob or tree. This allows every git object to have an owner, like before.

The same method of securely creating an object->address mapping will be used as the original design doc. In this scheme, Git Consensus accepts the raw contents of the object and notarizes it by running SHA-1 on-chain.

During execution, it also parses out any address found in the raw content found in the raw content of the object. If no address is found, it reverts.

Git Object Interactions

Git Consensus accepts the raw git object content, and finds the first occurrence of an address. Because there are different ways to assign arbitrary strings to each object, each will have its own different method for embedding an owner address.

Commit: No change required. Adding the address in the commit message still works.

commit

Tag: No change required. Adding the address in the annotated tag message still works.

tag

Blob: A blob will have the address in the content of the file itself. If a text file, this could be at the bottom of the file in a new line.

blob If the file is for code (e.g. *.js JavaScript files), then this could be a comment so that it does not affect functionality.

Tree: A tree will have the address as a file name in the directory it is referencing.

tree

New Design

We will simply now just have one method for adding Objects (as opposed to addCommit(), addTag(), etc). This new method will just accept raw bytes instead of structs that we specify, which is what we do with CommitData and TagData currently. This a what the interface could look like:

interface IGitConsensus {
    function gitObject(bytes[] calldata gitObject) external returns (bytes20 gitHash);

    function hashExists(bytes20 gitHash) external view returns (bool exists);
}

We leave out hashAddr() from the main function since some implementations might not care about address parsing.

Exact function name is not confirmed (gitObject() used here), probably want something that specifies that it accepts a Git object that needs to have an address in it, or something specific to SHA-1.

If we assume clients take care of putting the objects together correctly, implementations can be very simple. gitObject() will just be packing together the object to SHA-1, instead of parsing out the fields:

gitHash_ = Utils.sha1(
    abi.encodePacked(Strings.toString(bytes(data).length), bytes1(0), _gitObject)
);

Our implementation will still do address parsing on the raw data, so we create an object->address mapping.

Open Questions

  1. Should we allow multiple addresses in the raw content, such that each object can have a split ownership? Example:

    git commit -m "my commit message 0x852FAe62f68C87D8829c2b0A29739C9Eb92dad94 75 0xA1B130A491a8c9635AC7Dd30952b3EbBa3e5D319 25" To give 75% ownership of the commit to 0x852FAe62f68C87D8829c2b0A29739C9Eb92dad94 and give 25% ownership to 0xA1B130A491a8c9635AC7Dd30952b3EbBa3e5D319.

  2. Are these patterns too ugly or inconvenient for adoption? With ignore file patterns (maybe as a VSCode extension, or Github integration) these could be hidden away by default.

  3. For efficiency reasons, Git Consensus reads addresses from the last byte backward. In the commit, tag, and blob case, this works fine. In the tree case, '0x' is alphabetically first in the majority of situations. Could a pattern that puts the address blob at the bottom be adopted?

Other

Git Notes exist which wrap metadata around these objects. But in this case, this isn't very useful because the whole point of embedding the address in the contents is that the SHA-1 hash changes if modified.

References

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects https://raw.githubusercontent.com/pluralsight/git-internals-pdf/master/drafts/peepcode-git.pdf https://medium.com/@pawan_rawal/demystifying-git-internals-a004f0425a70 https://www.freecodecamp.org/news/git-internals-objects-branches-create-repo/

DmitriyShepelev commented 1 year ago

Blob: A blob will have the address in the content of the file itself. If a text file, this could be at the bottom of the file in a new line. If the file is for code (e.g. *.js JavaScript files), then this could be a comment so that it does not affect functionality.

How would you differentiate the actual file contents from the Ethereum address? Also, not all files support comments (e.g., JSON files).

A tree will have the address as a file name in the directory it is referencing.

An Ethereum address could be a file name, and vice versa. So, we would need some mechanism to differentiate them.

In the tree case, '0x' is alphabetically first in the majority of situations. Could a pattern that puts the address blob at the bottom be adopted?

If/when GitConsensus adopts ENS, this wouldn't be possible, so I'm not sure if this is something we should (or can) strive for.

Git Notes exist which wrap metadata around these objects.

Git Notes are mutable, which would be problematic to adopt.

mattstam commented 1 year ago

How would you differentiate the actual file contents from the Ethereum address? Also, not all files support comments (e.g., JSON files).

🙏 that none of the file content contains our parsing keywords (0x or .eth), or at least that it is the last occurrence of them, as we parse backward.

I agree it is pretty hacky though, and as mentioned some files don't even support comments so can be quite restrictive.

An Ethereum address could be a file name, and vice versa. So, we would need some mechanism to differentiate them.

A file named 0x* in a real-world case - that wouldn't be adding this for compatibility with Git Consensus? I've certainly never seen this, so it doesn't seem worth worrying about. But let me know if there is some circumstance in which you have.

*.eth may be a bit more problematic though, as this is a file extension for Ether source code apparently.

If/when GitConsensus adopts ENS, this wouldn't be possible, so I'm not sure if this is something we should (or can) strive for.

Good point, that optimization can't really happen as we can't predict what a *.eth would begin with.


Due to the lack of a simple way to append metadata to trees and blobs (like you can with message in commits and annotated tags), the cases there seem a bit contrived.

The plus side of doing a generalized input parameter (e.g. gitObject(bytes[] calldata gitObject)) is that it still works with commits and annotated tags just fine and is actually more flexible for parsing of those (e.g. if your address / ENS name was your author). Then, if the usage of trees and blobs comes up, we can say it's supported but with some notable limitations.

I'm open to entirely different routes here. Maybe a different standard could be adopted, such as a giant owners.txt file with each directory / file owner's Ethereum address in it.

Definitely looking for other approaches from everybody. Per directory / file ownership is definitely a non-trivial challenge but is an important use-case to cover.