Closed nknapp closed 4 years ago
I think that when a resource if first fetched (and cached) it should be checksummed and this should be written to a file (similar to yarn.lock
or package-lock.json
). Maybe having a way to specify required/expected checksums ahead of time as well. For example, in deno.lock
checksums:
- required:
- https://unpkg.com/deno_testing@0.0.5: ["sha384", "oqVuAfXRKa..."]
- computed:
- https://unpkg.com/foo@1.2.3: ["sha384", "32e66f3809..."]
Yes, this is definitely something to be dealt with eventually. I'll leave this issue open, but I don't plan to address it for a while. (There are bigger fish to fry.) @ianp Yea I agree. It would also be nice to specify the checksum in import too tho. Are there any sub-resource integrity proposals for import? Another idea that's been thrown around is to support IPFS for import - that also addresses the integrity problem.
I was originally worried about compromised domains and subresource integrity. To expand on @nknapp 's idea. What if instead of a hash of the file it was the hash of a public key, and the ts file would begin with a comment that contained the public key, and a hash of the file (sans comment) encrypted with the private key. e.g.
In deno code
import { test } from "https://unpkg.com/deno_testing@0.0.5/testing.ts#sha384-X3QnUIQNoQbIr3Nk2dmIFkkIdUwCl2WrX5RAPKa1TLYmHod/p9t/1c1fEnNJWUbI"
In testing.ts, first few lines
/* SRI-V0.0.1
{
"key": "the public key",
"type": "RSA-1024",
"hash": "the hash of the file, encrypted by the private key"
}
*/
obviously the build process would have to take care of adding the subresource metadata.
also, on a bit of a tangent, this could be combined with some server side magic for semver compatible packages.
import { test } from "https://unpkg.com/deno_testing/^0.0.5/testing.ts#sha384-X3QnUIQNoQbIr3Nk2dmIFkkIdUwCl2WrX5RAPKa1TLYmHod/p9t/1c1fEnNJWUbI"
The server would then see this /^0.0.5/
portion and deliver the appropriate, still signed semver.
Support IPFS and SRI hashes would be very useful and are two things that make it superior in regards of security and decentralization.
I guess https://github.com/ry/deno/issues/170 overlaps with this.
It's outside of the scope of this project, but I guess I'm suggesting publisher integrity with subresource integrity. Basically we trust a publisher and probably don't want to constantly update hashes in our project. What I'm suggesting above is a way to have subresource integrity with a hash that changes, but that can be verified to have come from the same publisher. Updates can be handled without massive changes to a project.
It's outside of the scope of this project, but I guess I'm suggesting publisher integrity with subresource integrity. Basically we trust a publisher and probably don't want to constantly update hashes in our project. What I'm suggesting above is a way to have subresource integrity with a hash that changes, but that can be verified to have come from the same publisher. Updates can be handled without massive changes to a project.
Definitely. Also jsDelivr, unpkg and others already provide the needed information afaik.
The way Go handles this with go.sum
files seems relevant. FAQs in their wiki
Added to 1.0 blockers.
discussion permalink (gitter.im)
Ryan Dahl @ry yes we need lock files......
Jed Fox @j-f1 Lockfiles only work if you can get a permanent link to a given file. Deno’s method of pulling data from arbitrary URLs means that there is no way to get a permanent link, so the only ways to implement a lockfile would be to store hashes and prevent the user from running the code if the contents of the URL have changed (like SRI) or to make the lockfile store a copy of every dependency (maybe in a compressed form so it doesn’t waste space?). Unless there’s something I’m not thinking of.
Ryan Dahl @ry @j-f1 storing a hash of every dependency isn't a problem - and, yes, if people link to a master branch they will not have stable code. a permanent link is only permanent until it isn't :P that is, just because "npm.org" or "crates.io" says something is permanent doesn't mean it actually is. Similarly for https://deno.land/std@v0.21.0/examples/cat.ts ...
Jed Fox @j-f1 That’s true, but those registries have a stronger commitment to immutability than we do, since it’s possible to modify a Git tag.
Ryan Dahl @ry deno.land isn't special - unpkg.org or pika.dev can maybe provide those sorts of promises - so lockfiles would be useful. The lockfile just says - "last time i ran this, i had this exact code, and next time i run it i want to error out unless it's this exact code" seems very reasonable feature. the fact that git supports force pushing has nothing to do with that.
Bartek Iwańczuk @bartlomieju seems relatively easy to implement with current infrastructure deno --reload would overwrite lock file I'm not sure how it should interact with dynamic imports tho
Nayeem Rahman @nayeemrmn Where exactly would the lock file go? DENO_DIR?
Ryan Dahl @ry @nayeemrmn hmm - yea I think that would be reasonable. probably we would just ignore the lockfile unless --locked was present ? actually the lockfile needs to be checked into the project - so it shouldn't be in the DENO_DIR.
Nayeem Rahman @nayeemrmn Exactly
Bartek Iwańczuk @bartlomieju
deno --lock
, deno --lock=.my.deno.lock
?
Ryan Dahl @ry
deno generate-lockfile https://deno.land/std/examples/gist.ts ?
kinda verbose
https://doc.rust-lang.org/cargo/commands/cargo-generate-lockfile.html <--- that's what cargo does tho
deno fetch --generate-lockfile https://deno.land/std/examples/gist.ts
idk something like that
Nayeem Rahman @nayeemrmn
Maybe DENO_DIR
could instead locate itself where the lock file is... like how npm uses package.json
to locate the project root and puts node_modules there? 😅😅
Then you could implicitly declare a project with touch deno-lock
or something
Ryan Dahl @ry I think just explicitly specifying where the lockfile is is fine it's not something you do interactively usually - so it doesn't need to be ergonomic you just have it in CI
Nayeem Rahman @nayeemrmn It needs to be done for any commit with a new dependency., that's if you only care about CI.
Andy Fleming @andyfleming
What's the behavior on mismatch with the lock?
It seems like the lock file should automatically be used and a flag --ignore-lock
and/or --update-lock
should be offered.
Andy Fleming @andyfleming
Also, @ry, I disagree about the lockfile only being needed for CI. Even locally when developing, dependencies should be deterministic by default. I should be aware if I’m pulling a different version/content than expected. Dev teams commonly use npm ci
for this reason now.
I could be won over on it being a warning by default. That's reasonable.
Then we could have another flag like --strict
or --strict-lock
that would cause the script to exit immediately on mismatch
(which would be used for CI/production)
Andy Hayden @hayd strict with a descriptive error message seems a better default.
Nayeem Rahman @nayeemrmn
It seems like the lock file should automatically be used and a flag
--ignore-lock
and/or--update-lock
should be offered.
The problem with this was that there isn't an obvious answer to where the lock file should go by default and whose dependencies populate it, that's what spurred the discussion. I agree that using lock files shouldn't require a flag every time... this should be thought through more.
Andy Fleming @andyfleming
The SRI approach (denoland/deno#200) would address that. Locks would effectively be inline.
Then it could fail with a descriptive error message (like you are saying @hayd) by default unless --ignore-sri
is on
The only thing that would get a little uglier is generating/updating "locks". Do we modify the import lines for users with a command?
Nayeem Rahman @nayeemrmn To be clear, I still like lock files being optional. My suggestion was to have deno scan your directory tree for a lock file and use it if and only if it exists... somehow build a solution around that.
Yeah I haven't looked much into it but SRI seems like it would work too. Lock files are probably more convenient.
Andy Fleming @andyfleming A challenge of the SRI approach is if you wanted to lock someone else's files. You might be able to lock the contents of the first file that you are importing from a URL, but if it imports another URL without SRI, we don't really have a guarantee about that 2nd file's contents.
Nayeem Rahman @nayeemrmn Oh... yeah that's definitely a deal breaker. SRI doesn't substitute a lock file.
Andy Hayden @hayd yeah, I guess my point was IF you have a lock file it THEN you should error if there's a mismatch.
other questions seem very much open...
My current suggestion is to walk up the directory tree looking for a deno.lock
and use it as a lock file if one is found.
By 'use' I mean it should be checked for all fetches made from within its scope by default, but not written to unless explicitly specified with --lock
or something.
Writing to it by default would mean it gets populated when you're experimenting or using some remote utility that's not a dependency of the project. I would want the lock file to only be updated when I run deno fetch --lock src/main.ts
or something - but I still want it to be checked for everything by default.
The result of this is that lock files are opt in, but as a project author you can still force the checking part onto collaborators/CI.
My current suggestion is to walk up the directory tree looking for a deno.lock and use it as a lock file if one is found.
I do not want to get into the business of trying to guess the location of things. This seems innocent but can be very complex for other tooling to interact with. I don't mind if we check the current directory for a fixed filename - but I think it would be better if we always explicitly specified the lockfile filename.
I proposed in #3179 that we have a comprehensive meta data file, which would be opt in, work in a very similar way to a main module, and be the explicit location for the locks for a given workload.
I agree, guessing and walking isn't good. We should have the ability to utilise a remote set of lock information just as we do with other code (and if we do it for lock information, why no co-locate all meta data into a single file?).
I do not want to get into the business of trying to guess the location of things. This seems innocent but can be very complex for other tooling to interact with. I don't mind if we check the current directory for a fixed filename - but I think it would be better if we always explicitly specified the lockfile filename.
Makes sense. The intention behind that was to somehow make the checking part of it implicit. I think that's important. I hope there'll be a way to achieve this either way.
Is it true that by default Deno doesn't lock the dependencies?
IMO locking should be the default behavior and a flag should allow to "unlock", i.e. update the dependencies. Otherwise you would have non-deterministic builds except on your local machine where you have the dependencies cached. I think deterministic builds are important.
This would also match the current behavior of NPM which creates package-lock.json
on first install and uses it over package.json
on any subsequent installs.
It would however mean that there needs to be a default lockfile. Just locking the dependency with a hash in the URL won't do it, since any transitive dependency can still change if your dependency doesn't have them locked in its import statements, which you can't rely on.
It would need to be some file and it would also need to be in the current project directory such that you can commit it to your VCS. Probably best putting it in the current folder like NPM does.
A lengthy discussion about versioning and deterministic builds can be found at joelparkerhenderson.
While its a neat feature, to load scripts as if they were in a web browser, in the examples I've come across, I didn't see anything similar to the javascript 'integrity' attribute to ensure the remote source code matches the loaded remote code.
Does Deno have such a feature? If so, disregard the rest - if not, my question is as such:
Lets say some deno code (Package A) exists in the wild, that depends on deno scripts from:
https://ahungry.com/deno-package.ts (this could be example.com, but I'm not sure that does expire)
Unlike a central repository similar to npm (which has its own problems), domain names are not permanent, and are time expired.
If ahungry.com was to expire, and a malicious actor then registered it, and hosted a malicious deno-package.ts script on their remote end, a problem arises - how does Deno ensure the user doesn't run this altered code?
If this is not the first time the user has executed Package A, perhaps its not going to be an issue until the user runs the deno upgrade (remote packages) command, at which point Deno could detect the TLS fingerprint of the remote is now different.
But, for a first time user of Package A, that has never run 'deno' before, and therefore has no hidden cache of deno packages - wouldn't it eagerly pull in and execute this code?
NPM is not suspect to this same problem, unless they expired npm packages yearly and were to allow anyone to claim other's slots after this time period (which would seem ridiculous in the context of package hosting).
@ahungry
What you mention is the whole point of this issue. As long as the dependency is cached, it won't load the malicious one. But if it isn't, it will load whatever is behind that URL.
If you however locked the dependencies on the first run by passing the --lock
flag, then it won't load the changed dependency because the hash doesn't match with the locked one.
A concern however is, that Deno by default doesn't lock the dependencies, unlike NPM which by default creates a package-lock.json
. This is also the reason for my previous comment above yours.
Lock or not is a side concern - in the scenario I outline above, if a person does the first run, even with the lock flag, they would simply be locking themselves into the compromised URL.
Where this would actually be problematic is if they were remotely installing Package A itself, which depends on the compromised Package B URL, and the Package A maintainer is unaware there was a shift in ownership of that Package B URL with the malicious code from the expired domain.
Well, it's your responsibility to make sure you know what the URL points to when you first execute the script. The same applies for NPM, when you first install your packages, as NPM could be compromised and you get a bad download.
What you mention a "side concern" is actually what would prevent you from having the problem you described. If Deno locked by default on the first execution, then any change in the URL, even deeply burried in a dependency chain, wouldn't affect you. As of now, you have to run the script the first time with --lock
manually.
Well, it's your responsibility to make sure you know what the URL points to when you first execute the script.
We all know that in reality this is often not the case when more people start using Deno.
The same applies for NPM, when you first install your packages, as NPM could be compromised and you get a bad download.
You can't change already published releases there. Also they have some internal integrity hashes in their APIs.
What you mention a "side concern" is actually what would prevent you from having the problem you described. If Deno locked by default on the first execution, then any change in the URL, even deeply burried in a dependency chain, wouldn't affect you. As of now, you have to run the script the first time with
--lock
manually.
If so we should warn when --lock
is not used and optionally provide some SRI hash solution as we also know that most will not use lockfiles or not commit them.
Deno advertises "secure" and "security" by default. So why not cover these cases in some way, even if it is just a warning or some sort of list of SRI hashes.
Go for example has go.sum which includes cryptographic hashes - see https://github.com/golang/go/wiki/Modules#is-gosum-a-lock-file-why-does-gosum-include-information-for-module-versions-i-am-no-longer-using and https://golang.org/cmd/go/#hdr-Module_downloading_and_verification.
I think the current discussion is continued at https://github.com/denoland/deno/issues/3179
So far I will wait until Deno has the needed implementation to check the files based on cryptographic hashes.
I'd prefer the "hash in the hash" SRI concept over lockfiles because it helps to ensure integrity through the entire dependency chain.
If library A uses library B, it can lock the specific version of library B for its consumers. As far as I know, this is not possible with lockfiles which are only produced when the library consumer is fetching the library (and its dependencies).
// in https://a.net/lib.ts
import { B } from "https://b.net/lib.ts#$algo-$hash"
// ...
export function A() { /*...*/ }
// in my code
import { A } from "https://a.net/lib.ts#$algo-$hash"
// do something with locked version of lib A (locked by me)
// which itself uses a locked version of lib B (locked by the author of A)
I agree that the hash in hash concept is most ideal for reproducibility, but when one wants to introductle features like npm audit
and upgrade only the packages that have security issues, then the hash in hash concept becomes a problem.
@bobvanderlinden Can you elaborate on how upgrading insecure packages becomes more of a problem with SRI via hash? I'm not sure I understand.
@MarkTiedemann If package A is using package B and package B is using package C. Then whenever C requires a patch, then the hash that is used in B to refer to C needs an update. Then the hash that is used to refer to B in A also needs an update. All packages (A, B and C) need an update to patch just the issue in A.
If we're talking about multiple versions of package B (version embedded in the URL), then every one of those versions need to be patched (bump the patch-part of the version) to resolve a security issue in package C.
With a lock file it would be possible to change the contents of C without altering references to C from B (and indirectly A).
I do want to make clear that I'm not against this idea, but it's just different and more strict than what people might be used to. There is a case to be made that when C has a security update, that B first needs to run its release cycle (run tests, etc) before it can make use of C. Same would go for A. Since the security update could in theory alter the behaviour of C, all tests need to be run again.
Thanks for the detailed explanation, @bobvanderlinden!
My assumption is that, just like altering the lockfile to fix security issues in dependencies of dependencies, a tool similar to npm audit
could also directly alter the import hashes of the dependencies in the code to fix these issues. Since the third-party code is downloaded on disk, it should be possible to modify the imports, just like the lockfile.
@bobvanderlinden @MarkTiedemann Under Deno's current import model, I don't think this is possible:
With a lock file it would be possible to change the contents of C without altering references to C from B (and indirectly A).
Libraries that depend on other libraries at a patch version level already need to be updated to incorporate any updates, and lockfiles as they are implemented currently in Deno don't change that. It's not possible for an upstream user (whether an application, or another library in a stack of libraries) to change the version that is depended upon.
(If you're talking about some hypothetical improvement to Deno which would allow for this, like in node/NPM, then this could theoretically happen; but I don't think that'll happen, at least in Deno core, because the team has been explicit about Deno adhering to ESM and HTML import standards.)
Either B depends on a precise version of C (for example, under semver, v1.2.3
), which would require rewriting B's source in order to change. Or B depends on a mutable version of C (e.g. v1.2.*
) which means that the end-user will get whatever code is served by that URL at the time when they download the file.
Lockfiles in NPM actually modify which packages are resolved for a particular mutable import (e.g. require('something')
is mutable, and depends on package.json and the lockfile to determine its meaning).
Deno's lock files do not change the semantics of import statements; they just check that after all the code has been downloaded, it matches the hashes specified in the lockfile. This means you've downloaded the code the author of the lockfile intended you to download. It prevents loading new code from a URL which mutated unexpectedly.
EDIT: import maps provide a way to modify the import resolution algorithm, but they're unstable. But they can let you either pin a mutable URL to a specific version (e.g. rewrite a path containing version 1.2.*
to 1.2.3
), or just flat-out replacing one URL with another.
Ah I wasn't aware of import maps. That seems like a viable solution for this issue as well :+1:.
There's some interesting discussion about this topic here https://github.com/WICG/import-maps#supplying-out-of-band-metadata-for-each-module and especially in this presentation https://docs.google.com/presentation/d/1qfoLTniLUVJ5YNFrha7BaVumAnW0ZgcCfUU8UbyyuYY.
I feel like a combination of "hash in hash" for SRI and import maps for upgrading vulnerable dependencies of dependencies is a good idea. So "hash in hash" is secure by default but can be overwritten if necessary.
I don't know if this already came up, but if security is a major concern and referencing modules via http(s) should be possible, then maybe there should be a way to use subresource integrity to ensure that the downloaded module isn't tampered with.
It could be part of the hash, like
That would be very explicit, although not compatible to semver ranges.