Open mitchmindtree opened 1 year ago
I think it would be worth adding a section on how you will verify the integrity of files. There would need to be some way to know hashes of the files (It seems IPFS hash != content hash) are correct before you fetch the files. Then you would need to verify the file matches the content hash before you run any code and delete it if it doesn't match.
There would need to be some way to know hashes of the files (It seems IPFS hash != content hash) are correct before you fetch the files
As far as I understand, an IPFS content address is a hash of the content and IPFS does verify the integrity of the fetched content by checking the hash.
I guess it would depend on the client you use?
True, it'll be good to keep in mind that whatever approach we take, we should ensure that the integrity of fetched content is actually being checked. If it turns out our client of choice does not, I don't imagine it should be too tricky to add an extra step that uses ipfs to check the content address ourselves.
I would like to hope all clients would do this by default considering the P2P nature of IPFS, but you're right we shouldn't assume :smiling_face_with_tear:
First of all the proposal looks incredibly cool, very very excited for this 🎩
Generally, NFT-based name services also allow the transfer of ownership over names, and it’s not clear how we’d handle the case where the original publisher’s name has been transferred in a manner that doesn't hurt forc's reproducibility.
I am curios about whether the transferability of name would create an attack vector in this case. In a scenario where there are ultra popular libraries published by publisher_a
and the name gots transferred to a malicious 3rd party, they can inject any code they want and publish a patch. So a malicious dependency can be introduced to lots of package. Since removing packages is not possible, in the case of malicious activity, it would require some kind of social coordination so that any contract that depends on the malicious patch is not deployed. I wonder if this something to consider, if so what should we do about it 🤔
I am curios about whether the transferability of name would create an attack vector in this case.
It's worth keeping in mind we already face this issue with cargo today! E.g. cargo allows for transferring ownership of a crate, and it's really up to downstream maintainers (or auditing tools) to keep track of changes between updates.
As you've pointed out, the key difference is that the crates.io team can and do occasionally remove crates once identified as malicious, whereas we would be unable to remove malicious packages. That said, it's unlikely the crates.io team are able to keep track of all of the malicious packages that are out there and that their approach to removing malicious packages is more of a best-effort attempt.
We can possibly improve the experience in forc
by highlighting transfers of ownership during updates to the set of pinned packages (i.e. when the Cargo.lock is updated during forc update
). Rather than only showing the version change for each dependency, perhaps we could also show the number of new authors that have gained publish rights since the previous versions? This might help to highlight when close attention should be paid to a version change.
In the long run we could also consider hosting some security advisory automation, similar to rustsec, that tracks known vulnerabilities or issues in the ecosystem. We could potentially integrate something like this into forc
itself, where warnings are raised during the pinning of known bad packages.
Ultimately, I think it's up to Sway devs and publishers to be responsible for vetting their own dependencies and dependency updates before deployment, but we should do all we can to make this easier :+1:
Maybe not needed, but this issue reminded me of https://ipld.io/ (a hash-based cross-protocol data format from the same team as IPFS). Lots to digest here but I love the overall idea. I need to consider the whole NFT idea a bit more carefully than my initial read through, but I like that it leverages blackchain native tech.
Couple of initial thoughts on the implementation:
I created this initial design diagram attached to this and summary is down there.
While publishing:
While fetching:
Size of the N for partitioning can be determined depending on the overhead. If it is a lot we can increase it and it would automatically mean that we are publishing most of the files as a single block. As they are probably in bytes not kilobytes.
And an example metadata file can be like:
{
"name": "package_name",
"type": "folder",
"child": [
{
"name": "src",
"type": "folder",
"child": [
{
"name": "main.sw",
"type": "file",
"id": 0,
"CID": "XXX"
},
{
"name": "main.sw",
"type": "file",
"id": 1,
"CID": "XXX"
},
{
"name": "main.sw",
"type": "file",
"id": "N",
"CID": "XXX"
}
]
},
{
"name": "forc.toml",
"type": "file",
"id": 0,
"CID": "XXX"
},
{
"name": "forc.toml",
"type": "file",
"id": 1,
"CID": "XXX"
},
{
"name": "forc.toml",
"type": "file",
"id": "N",
"CID": "XXX"
},
{
"name": "forc.lock",
"type": "file",
"id": 0,
"CID": "XXX"
},
{
"name": "forc.lock",
"type": "file",
"id": 1,
"CID": "XXX"
},
{
"name": "forc.lock",
"type": "file",
"id": "N",
"CID": "XXX"
}
]
}
In the last meeting we had about this design, we mentioned it might be good to be compatible to go-ipfs implementation. I played around ipfs-embed
today to be able to get it fetch a single file from a go-ipfs
node with its compat
feature, related issue is here. I haven't succeeded yet. But from what I understand the main design philosophy of ipfs-embed
is a little different from go counterpart. I feel like ipfs-embed
is more geared towards networks of devices with a specific purpose rather than a general network like the go implementation. This does not conflict with our design as it is well possible for our package hosting network to be a separate network. I am still learning details about IPFS, but from what I understand so far, we might be needing a bootstrap node. A node that every node can visit at the beginning to learn more about possible peers in the network. If we go this way, we may need to host the bootstrap node. If we can get ipfs-embed to work nicely with go-ipfs implementation we could use default bootstrap nodes they pack with.
I also looked around the space to see if there is an out-of-the box compatible implementation that we can use as a drop in replacement. As we also discussed in the previous meeting iroh was a contender but they decided to take a different path and break compatibility with go implementation. But before they started working on their implementation they also left the old version as a renamed library, beetle. They stated that this is compatible with go-ipfs implementation to a degree.
I think we first need to be able to fetch and publish a single file in a setup, and on top of that we should settle on the folder structure. We can push to use unix-fs etc as well. I started implementation from designing that but it looks like I need to get the basics done first and actually achieve publishing a single piece of data and get it back on an another instance first 😄
Leaving a note here that the registry design could be more stateless using a log based approach if we more heavily leverage indexing.
The following is a proposed design around decentralised package hosting and registration for
forc
.Goal
The aim is to provide a user experience akin to Cargo’s package registry in a manner that avoids some of the issues associated with its centralised approach to hosting, registering and indexing packages.
Rust’s Approach
In Rust, users can easily publish packages with
cargo publish
, and depend on packages with simple, semver declarations e.g.serde = "1.0"
.By default, packages are published to and fetched from crates.io. Crates.io acts as a canonical registry (the official record of all packages), host (src code and manifest for each version) and index (allows searching / exploring packages and discoverability).
The caveat with this design is that when crates.io is down for whatever reason, projects cannot be published or fetched by Rust users. It also means that crates.io is responsible for the content that it hosts, and must respond to licensing challenges, moderation requests, name-sitters, and so on. Cargo users must trust whoever happens to have control over crates.io and their ability to secure it from from malicious actors.
Proposed Approach
This design aims to improve upon Rust’s approach by avoiding the use of a central authority to handle package registration and hosting. Instead, we opt for delegating the registry and hosting roles to suitable decentralised tooling.
At a high level, the intent is to:
Package indexing could be built upon these two components, but in order to narrow the scope of this work, implementation details around package indexing (website for search-ability, discoverability, etc) are omitted.
Package Hosting - IPFS
The core appeal of using IPFS is that hosted content is content addressed. This means that, only knowing the content address of a package (i.e. the SHA256 of the package contents), we can reliably fetch exactly that package, as long as that package is hosted by an IPFS node.
To ensure that at least one node always hosts all packages, we should host an IPFS node under Fuel Labs. This node would simply query/monitor the registry for all known packages, and ensure that all are pinned and available from at least one location on the network. This ensures that, at a minimum, we always offer at least the level of persistence offered by crates.io.
With IPFS as a foundation, there are many options for reducing the bus factor of running one node. E.g. we could operate a second node on a separate platform, use Filecoin to incentivise 3rd-party hosting, clearly document how users can run their own nodes and pin all packages to support the network, and so on. Organisations with a large investment in Fuel would be naturally incentivised to run their own package-pinning node(s) to help improve the robustness of the ecosystem.
What about Arweave?
Arweave similarly offers content addressed hosting with greater guarantees around permanent content availability. However, using Arweave would require a little extra indirection for package publishers, namely that publishing would require having an Arweave wallet and some AR to pay for the hosting. The IPFS network on the other hand is free to participate in and “publish” to allowing us to offer a smoother publishing experience, with the trade-off that we’ll need to guarantee content availability on our own as mentioned above.
The registry design below supports different kinds of content addressing in case we decide to take advantage of an alternative approach to content addressed hosting like Arweave in the future.
Package Registry - Fuel Contracts
The role of the package registry is to act as a source of truth w.r.t. what packages are available, what versions are available, who owns packages, and who is allowed to publish new versions. The Fuel network’s immutability and persistence makes it well suited to hosting this kind of info.
On-Chain Package Data
Data published to the Fuel network for a single version of a single package should roughly include the following:
[u32; 3]
u32
b256
b256
The idea behind separating the metadata and source is to allow indexers to trivially fetch high-level metadata for a package without needing to fetch the entire source code. We can impose conservative upper limits on the size of the metadata so that package indexers can maintain predictable performance, and safely ignore potentially malicious packages with sizes that exceed this limit.
Our Package Metadata standard should roughly include the package name, semver, license, source CA, description, homepage and repository. Package Source should include the manifest, lock file and all source code.
Packages as NFTs
In order to allow for transfer of ownership and the approval of multiple authors for a single package, we can adopt the NFT standard.
Specifically, we could provide a
abi Package: NFT { ... }
, where “publishing” the first version of a new package is akin to implementingPackage
for a contract, deploying it and minting the first version. The owner (and approved authors) can “mint” subsequent versions of the package as unique tokens. ThePackage
abi would abstract over minting to associate the on-chain package data mentioned above with each version, and to impose guarantees such as ensuring newly published versions are greater than old versions. ThePackage
abi would expose a method for retrieving the package name, as well as a set of “yanked” versions so that authors can signal when certain versions were mistakes or should be ignored.Adopting the NFT standard allows us to take advantage of NFT-aware wallet functionality, so that Sway devs can view and transfer their packages using a familiar interface without the need for us to develop a custom solution for these actions. For better or worse, it also enables packages to show up in NFT marketplaces, directly exposing packages and their owners to the markets.
Deploying a unique contract per package also reflects a parallelism-friendly design for the VM, as users would be able to publish and read different packages simultaneously without interference (which would not be possible using a single contract to manage all versions of all packages).
Package Namespacing?
Cargo’s flat package namespace is simple, inspires creative package names, and allows for succinct dependency declarations (e.g.
foo = "1"
). However, it opens up crates.io to the problem of name-squatting.Crates.io’s policy on name squatting is that crates will never be removed or transferred due to squatting. However, they reserve the right to remove packages for legal reasons, or if a package violates the Rust code of conduct.
In our case, we have no ability to remove or transfer names, even if we would like to. To avoid issues associated with a flat namespace altogether, we can consider requiring that registry dependency declarations also specify the original publisher. However, when we only know the original publisher's pubkey, declarations can appear unwieldy, e.g.
This could be improved with support for an ENS-like name service for Fuel (like fuel nomen). E.g.
However, registering a name adds an extra barrier to entry for publishers. Generally, NFT-based name services also allow the transfer of ownership over names, and it’s not clear how we’d handle the case where the original publisher’s name has been transferred in a manner that doesn't hurt forc's reproducibility. The result is also still more verbose than the original declaration that can be achieved when omitting namespacing:
Seeing as deploying and registering package contracts will cost gas, this at least partially disincentivizes arbitrarily harvesting and squatting names compared to crates.io. However, we may want to also consider some approach to “rate-limiting” package registration in some manner to further disincentivize mass name harvesting in a single block.
Whether we go for a flat or publisher-namespaced approach is left as an open question, however for simplicity the remainder of this design assumes a flat namespace.
Deterministic Package Contracts
Ideally, it would be possible to deterministically generate the package contract solely from the SHA256 of the package’s name. If we could deterministically produce the contract, then we could theoretically also deterministically produce the contract ID.
Being able to consistently determine the package contract ID solely from the package name would enable
forc publish
to trivially determine whether or not a contract has already been deployed for a package, or if it needs to deploy one before publishing the current version.The generated package contract might look something like the following:
The major caveat here is that the bytecode generated by
forc
for this same contract may change betweenforc
versions in the case that some optimisation is enabled, or certain fields are re-ordered, etc. This would result in a different contract ID, meaning we cannot safely rely on this approach when publishing packages or fetching dependencies.Instead, we’ll likely need a single, central registry contract solely to act as a map from name to contract ID.
The Forc Package Registry Contract
While individual packages would be published under unique package contracts, we may still benefit from a single, central registry contract that records the mapping from name to contract ID for all known packages. This may also remove the need to index the entirety of the fuel blockchain to know what packages are available.
This registry contract would maintain a mapping to assist directly looking up packages:
Maintaining this contract requires that when the first version of a new package is published and its package contract is deployed, its contract ID is registered within this registry contract.
An alternative to maintaining a central registry contract might be to rely on fuel blockchain indexers to track the existence of all forc packages. This would mean that during publishing and fetching, forc would require access to one of these indexers, kind of defeating our decentralised approach. Forc itself could maintain an index, but this could be expensive, particularly on first use as forc would need to fetch or build the index from scratch.
Publishing a Package
Upon
forc publish
, we first query the registry contract to determine whether or not a contract exists for our package. If not, we deploy and register the package contract.While querying whether or not the package contract is deployed, we host the package’s metadata and source on IPFS. Once the package contract is deployed and registered, we publish the version and content addresses to the package contract.
Forc should log each of these steps as they begin, providing useful feedback in the case that any steps fail. Clean CLI wallet integration will be essential for enabling a smooth UX.
Publishing requirements would match those of cargo, namely that all required metadata is present, and all dependencies are also registry dependencies. Notably, cargo does allow for git dependencies pinned to specific commits. We could potentially offer similar support by implicitly publishing these dependencies somewhere under the package contract, but this can be left for future design work.
Fetching a Package
Upon the first build (or
forc update
) for a registry dependencyfoo = "1.2"
,forc
first queries the registry contract forfoo
's package contract ID. Next,foo
's package contract is queried with the given semver for the latest semver-adhering version offoo
along with the content addresses for its metadata and source. Finally, we use the content address to fetch the source, cache it locally and pinfoo
to the full version (e.g.1.2.3
).Subsequent calls to
forc build
would be near instantaneous asfoo
would be pinned and its source cached.If necessary, we could speed up the registry dependency fetching by using a centralized package index as future work. We can fall back to the slower, multi-step node client interaction in the case that the index is unreachable.
Side Benefits
Implementation
Here is a rough overview of the anticipated steps involved:
forc-wallet
can show balance, UTXOs https://github.com/FuelLabs/forc-wallet/issues/68.forc
(as an alternative togit
orpath
). E.g.3926.
Package
abi adhering to the design above with an associated package metadata and source specs.Package
contract implementation template that can be used to generate package contracts.PackageContract
+ IPFS workflow.RegistryContract
and integrate it into thepublish
command flow, removing the need for specifying the contract ID as it would now be fetched from theRegistryContract
.Questions
forc
should default to the latest network, but allow individual dependencies to be fetched from different networks?