Closed legowerewolf closed 2 months ago
How would you be able to identify which package you are importing if it's just a hash in the import statement?
Doc comments on your deps.ts
file? Ideally, the imported module would also have a doc comment at the top identifying itself.
The other option is sticking to a convention? You can reference linked data within an IPFS IPLD DAG structure (the stuff behind the hash) by name. This requires 'wrapping' the IPFS content in an external folder, but this isn't particularly burdensome.
import { Something } from "http://127.0.0.1:5001/ipfs/{hash}/some_package/mod.ts"
That also works, and preserves the core immutability features we're looking for. Although, I think that import string would be better as
import { Something } from "ipfs://{hash}/some_package/mod.ts"
to preserve the fallback options I proposed above.
Was going to create the exact same issue :sweat_smile:
Another protocol that might be suitable as well is p2p://
. I think this feature would be used a lot in tandem with import maps, later the community could create cli tools to manage the json file more easily making use of some registry that maps package names to hashes, so you could do module-management-tool add library
and it automatically adds library
with the right hash.
{
"imports": {
"library/": "p2p://<hash>"
}
}
import foo from 'library/foo.js'
And to go the extra mile that registry can just be another json file on ipfs managed by a decentralized blockchain based organization that is economically incentivized to keep a high quality registry with possibly different channels like testing
and stable
where packages are reviewed carefully to make sure no funny scripts make their way to the stable registry.
IPFS is built on libp2p - it has both node.js and in-browser support. So Deno just need to support that lib, and it can do bunch of p2p interactions then
From a security perspective, IPFS is not completely necessary, nor does it actually solve the issue of content-defined library imports - since it provides no guarantee that the requested file would internally use a secure hash scheme to import its own dependencies, and so would their own dependencies' dependencies, and so on.
I think it would also be productive to consider a URI scheme that is independent of any specific protocol (e.g. https
/ipfs
/p2p
/magnet
) and instead:
protocol://path/some_package@version/module.{hash}.ts
strict
secure mode in which every imported file must include a hash (or, for the very least, files that are fetched from external sources).For compatibility and future proofing: it seems reasonable to use IPFS's multiformat scheme that allows to support various kinds of hash algorithms.
Edit:
The filename doesn't necessarily need to embed the hash. This scheme may be as secure (not completely sure, need to consider it further..):
protocol://path/some_package@version/module.ts#cid={hash}
(Using a key=value
scheme the #
part can be made extensible to allow for other metadata such as digital signatures etc.)
After thinking about this further I've found a simple alternative solution that guarantees 100% content addressed safety and reproducibility for the entire dependency tree:
It would look like this:
protocol://domain/path/name@version/module.ts#lock_cid=QmcRD4wkPPi6dig81r5sLj9Zm1gDCL4zgpEj9CfuRrGbzF
Where lock_cid
is a content identifier (basically a hash in a flexible format) for a lock file that would include the hash of the target file (here module.ts
) as well as the hashes of all imported files in the entire dependency tree (possibly including imports from the standard library):
The file might look like something like this (or might use JSON etc. this is only shown for simplicity):
https://my.website/path/name@version/module.ts Qmf8obm7bxrQS1JnjUniJdibcN2kUJy9zz732sr7o3dxtn
https://my.website/path/name@version/utils.ts Qmeg1Hqu2Dxf35TxDg18b7StQTMwjCqhWigm8ANgm8wA3p
https://my.website/path/name@version/methods.ts QmZfSNpHVzTNi9gezLcgq64Wbj1xhwi9wk4AxYyxMZgtCG
https://someother.website/path/name@version/othermodule.ts QmbKxNNCxBox7Cmv3jiUZbiG3zpzmtnYzVUuKHxfAjvpyH
https://someother.website/path/name@version/othermoduleutils.ts QmPwwoytFU3gZYk5tSppumxaGbHymMUgHsSvrBdQH69XRx
https://deno.land/std@v0.3.0/async/delay.ts QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q
https://deno.land/std@v0.3.0/async/deferred.ts QmWZtn3ahqqpGBBRZqPdthcWz2n1rxc1UuiDoWXrgrHKzZ
...
Since the lock file is content addressed, it can be fetched from anywhere, either from IPFS, or from the web server itself (say in https://my.website/path/name@version/module.ts.lock
).
The lock file can also be effectively used as an IPFS-like merkledag (though unlike in IPFS, it doesn't represent a directory structure, but a collection of references to various sources), but since all the references use cid
s, they can all be potentially fetched from IPFS (and in parallel, which may also improve performance).
Technically, if some of the imports in the dependency tree already refer to their own lock file, then it may not be strictly necessary to include them in the lock file. However, since the storage requirements of a structure like this are relatively minimal in modern standards, it may be better not to rely on multiple sources and include everything in one file (even if techincally redundant).
ha! I had similar thoughts here: https://github.com/denoland/deno_website2/issues/406#issuecomment-633298685
@srdjan Interesting you were going the same way.
I've modified this approach through several iterations and eventually came to a solution that doesn't actually require the library authors to know anything about the existance of a lock file or even the hashing scheme. I'll try to clarify some aspect for the design :
The hash part (#
) is never actually sent to the server, it is only for local use:
https://domain/path/name@version/module.ts#lock_cid=QmcRD4wkPPi6dig81r5sLj9Zm1gDCL4zgpEj9CfuRrGbzF
The lock file is individual to the dependecy tree of a particular ts
(or js
) file (there is no consideration of a directory structure here), but by convention, for the above URI, one of its search locations might be:
https://domain/path/name@version/module.ts.lock
However, it doesn't have to. It can be stored locally or fetched from a p2p network like IPFS.
As an importer of a third-party ts
or js
file, you would be able to produce, by yourself, a lock file for that particular import and put it practically wherever you want.
Alright, after considering it even more, I've realized that the lock file isn't even strictly necessary to be stored anywhere. Instead, it can be regenerated purely based on the content and structure of the dependency tree for verification purposes.
It's actually pretty simple:
Say I want to import this URI, but I also want a strong proof that ensures that what I get is always the same:
import * from "https://example.com/path/name@version/module.ts"
I use a tool that walks the dependency tree of module.ts
in some deterministic order and records the URIs and hashes of all the files it finds and puts the result in some file called module.ts.lock
. I annotate the hash of that file into the link like this:
import * from "https://example.com/path/name@version/module.ts#lock_cid=QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q"
Now whenever someone encounters this annotated link. They have two options:
That's pretty much it.
As an update, this happened. Chrome extended support for custom protocol handlers to include, among others, support for ipfs://
and ipns://
.
Going off of some comments that have been posed in this thread, would it make sense for there to be a convention that utilizes IPNS as part of its base path to host modules? Since the base ID for an IPNS directory is just the multihash of a public key, this could be used to create signatures for the modules to verify the signing authority.
So someone responsible for creating a module could host the directory at ipns/<foo_pubkey_hash>
with the following structure:
pubkey.txt
modules/
foo_module/
index.ts
index.ts.sig
bar_module/
index.ts
index.ts.sig
Then the directory could be declared from within a package.json
, and the necessary import logic could be handled by the runtime:
{
"imports": {
"alias_name": {
"cid": "ipns/<pubkey_hash>",
"modules": ["foo_module", "bar_module"]
}
}
}
The files themselves could then be imported as such:
import React from "ipfs:alias_name/foo_module"
The runtime could check the signatures of the modules being requested for import and error if they don't line up, or if the pubkey doesn't match its IPNS hash.
For me, it seems like IPNS would kind of defeat the purpose of IPFS for Deno.
I'd like the hash to be in the import statement, and the URL to always point to the same content, instead of needing a separate package.json file to map names to hashes. At that point, it just seems like a package lock file to a regular website whose content can change.
IPFS uses multiaddr, which is incompatible with URL. /ipfs/Qmf8obm7bxrQS1JnjUniJdibcN2kUJy9zz732sr7o3dxtn
is an example multiaddr.
Just wanted to chime in that it would be nice if we could have a way to add new protocol schemes for imports (or maybe for fetch as well from within code).
I'm the main developer of Agregore, a p2p web browser that supports stuff like IPFS. I'd love it if we could reuse modules published for p2p web browsers like Agregore within Deno.
In Agregore, we've extended the browser's protocol handlers to support ipfs/ipns/etc not just for reading with GET, but also for writing with PUT.
On desktop we're doing this via the protocol API provided by Electron, and via some C++ changes on mobile.
Similarly, the Node.js VM API enables us to provide custom "linkers" which can enable customizing module resolution for custom protocols (something I've played around with in webrun).
Having a way to dynamically add protocol scheme handlers in Deno would make it easier to integrate stuff like IPFS in user-land without needing large dependencies.
Actually, yeah, I think that would be a good way to go about it. If you could register new protocol schemes for Deno (probably with a Deno
-namespaced function), you could support any protocol you wanted. If this registration also added support for those protocols to the fetch
function, it'd be even more useful.
Having a way to dynamically add protocol scheme handlers in Deno
While that would be a good start, and not something I'm opposed to, I think it would be a mistake for IPFS to not have built-in support.
For example:
IPFS solves that problem (anyone could host the color package source code and the scripts could find it).
But,
If the only IPFS support is through custom protocols, my understanding is we're going to need a non-IPFS import, before we can use IPFS imports. E.g. it's still centralized with a possible single point of failure, unless we go with the primitive solution of pasting the entire custom protocol library at the top every time.
TLDR; decentralization/stability only works if IPFS is natively supported.
Yeah. It'd be good if it was built-in. It might not even be that hard - has anyone tried running js-ipfs inside Deno? It might just work. Sure, implementing support for talking to a user's existing node would be good, but I'm pretty sure the JS version is stable enough for our needs here.
EDIT: Looks like the esm.sh transpilation CDN is having issues at the moment. The following should work right, but it isn't.
import * as IPFS from "https://esm.sh/ipfs-core@0.15.4";
const node = await IPFS.create();
const stream = node.cat("QmPChd2hVbrJ6bfo3WBcTW4iZnpHm8TEzWkLHmLpXhF68A");
const decoder = new TextDecoder();
let data = "";
for await (const chunk of stream) {
// chunks of data are returned as a Uint8Array, convert it back to a string
data += decoder.decode(chunk, { stream: true });
}
console.log(data); // should output "Hello, <YOUR NAME HERE>"
but I'm pretty sure the JS version is stable enough for our needs here.
I've been getting familiar with running Deno in Rust, so I was thinking about exploring implementing it on the backend. The IPFS client for rust still needs some work though. The JavaScript client is probably the faster way. I might attempt to bundle it through parcel to esm.
As a start, Deno could just check for files under $HOME/.ipfs
(or whatever the path to the store is)
Since deno hasn't got the infrastructure for custom protocols yet, I've started sketching up a tool based on Node.js in the meantime. 😁
https://github.com/AgregoreWeb/agregore-cli/blob/main/test.js#L34
With this it should be possible to run modules from IPFS and to customize the APIs available to them. (at least to start experimenting).
Since it's only providing web APIs any scripts written for it should be portable to deno down the line.
In Python when installing dependencies you can specify hashes:
requests==2.31.0 ; python_full_version >= "3.12.0" and python_full_version < "3.13.0" \
--hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003f \
--hash=sha256:942c5a758f98d790eaed1a29cb6eefc7ffb0d1cf7af05c3d2791656dbd6ad1e1
Poetry by default generates lock file with hashes and if you export dependencies to a requirements.txt file it will include hashes as well.
That way you can be sure nobody tempered with the content of your dependency module.
We are not going to do this.
We are not going to do this.
Sad, but understandable
We are not going to do this.
I'm also sad, but I appreciate the clear communication.
The problem Files at URLs are mutable - they can be changed or deleted at any time, by anyone with access - whether that access is legitimate or not. If the host gets hacked, bad code could be injected for anyone who fetches it.
The solution Imports from IPFS. IPFS is a content-addressed globally-distributed filesystem. Files are identified by a hash of their contents, so they can never be modified without changing the file identifier. A given hash will always point to the exact same file, forever.
Additionally, as it's globally distributed, the chance of files disappearing forever when they're depended on is nearly zero. No more of this. And none of this.
Implementation options Right now, the IPFS community seems to be standardizing on
ipfs://
as a URI scheme for files on IPFS. We can use that to identify when an import is from IPFS, as opposed to a HTTP(S) import, and act accordingly.Now that we have the file identifier, there's a few options for fetching it.
Use the local IPFS node. Chances are, if a user wants to import from IPFS they're running a node that we can talk to to resolve files. IPFS nodes have an HTTP API that typically runs (for localhost) on 127.0.0.1:5001, and allows you to get files that way. An import from
ipfs://{hash}
can be fetched fromhttp://127.0.0.1:5001/ipfs/{hash}
- we can also optionally pin files to the user's node so they're reprovided to others on the IPFS network. Full local HTTP API docs are here.Use a public IPFS gateway. There are a fair number of them, and known gateways and their status are tracked here. File resolution is always at
{gateway}/ipfs/{hash}
.Run a local IPFS node. This would be the most difficult to handle, as it's not just a string manipulation to translate an
ipfs://
import into ahttp(s)://
import.Personally? I recommend using the installed local node and falling back to a select list of public gateways.