Open Merovius opened 7 months ago
a variation may be: the proxy and pkgsite uses a clean tagged version, but the source repo later moves the tag. pkgsite source links use tags when available. this may only become visible if someone tries to build with GOPROXY=direct.
Alternatively, should pkgsite use services like https://go-mod-viewer.appspot.com for source link, instead of attempting to link to the repository? go-mod-viewer.appspot.com does not offer rich UIs like VCS hosting sites, but it shows the source code as included in the module zip file.
cc @jba @rsc @golang/tools-team
@hyangah Who owns the service you propose?
@Merovius Could you not achieve the same by discriminating using pkgsite idiosyncrasies and serving everyone else but the pkgsite a malicious Git repository? Without an authoritative ground truth you trust, you can never be sure two requests are semantically compatible.
Thus, I think the indication you request is fundamentally impossible unless you install a browser plugin which is attached to sum.golang.org.
The real problem is that the malicious package is already the ground truth, since curl -fsSL sum.golang.org/lookup/...
and local cat go.sum | grep -i ...
do agree. What you are doing is in a sense a distraction campaign since you leverage GitHub's name to give credit to your non-malicious shadow package.
Could you not achieve the same by discriminating using pkgsite idiosyncrasies and serving everyone else but the pkgsite a malicious Git repository?
pkgsite
could link to a revision hash, instead of a reference. That way, it first verifies that a given commit ID refers to a tree that generates the same zip as served by the module mirror and then ties the link to that verified commit ID, guaranteeing that all links correspond to the code actually creating the cached module.
I'll also note that I'm not sure if and how pkgsite
links to source, if the source is not on GitHub or another well-known hoster. For those, I think the risk of them doing those kinds of exploits is relatively low.
But yes, it's always going to be imperfect. That does not mean we can't improve it.
FWIW another interesting response to this might be for go build
, if it happens in a git repository checked out at a semantic versioned tag, to do a sumdb lookup against the contents of the repo and warn, if there is a mismatch. It's pretty limited in what it can catch, but it at least means that someone who cloned the repo to work on it (e.g. to do contribute) would potentially detect tampering.
But yes, it's always going to be imperfect. That does not mean we can't improve it.
Pertaining to your original post "What did you expect to see? ; Some kind of indication, that the code in the repository does not accurately reflect the cached version.", I think such an indication would create a false sense of security due to what I wrote above.
I'll also note that I'm not sure if and how
pkgsite
links to source, if the source is not on GitHub or another well-known hoster. For those, I think the risk of them doing those kinds of exploits is relatively low.
Cf. https://github.com/golang/go/issues/40477#issuecomment-764963287 [2024-04-06] Cf. https://github.com/golang/gddo/wiki/Source-Code-Links [2024-04-06] Cf. https://go.googlesource.com/pkgsite/+/master/internal/source/source.go#800 [2024-04-06]
For those, I think the risk of them doing those kinds of exploits is relatively low.
They are not performing an exploit. Your domain "gonih.org" tries to spread confusion about which source is trusted by the Go build system (based upon sum.golang.org).
Pertaining to your original post "What did you expect to see? […]", I think such an indication would create a false sense of security due to what I wrote above.
I'll note that I was bound by the issue-template, which required me to suggest a solution. Further, when reporting this privately, I was told that this is considered an issue of pkg.go.dev
, further restricting the space from which I could suggest those.
I was really just reporting an issue, not trying to prescribe how to address that. I also don't have any particular limit in mind, of what is considered the root cause of that issue. I don't really care how or where it is addressed. Discussing that is the point of this issue.
So I acknowledge that you don't like my original suggestion.
For those, I think the risk of them doing those kinds of exploits is relatively low.
They are not performing an exploit.
No offense, but please do not ignore the context of the thing you are quoting. It was a response to
Could you not achieve the same by discriminating using pkgsite idiosyncrasies and serving everyone else but the pkgsite a malicious Git repository?
which is something only the code hosting site (in this case GitHub) can do and which they'd have to do maliciously.
Ah, I formulated sloppily: What I was thinking of was your domain serving the pkgsite matching malicious module zip and malicious Git repository (so that they match) but everyone else malicious module zip and harmless Git repository. But I neglected to notice that the browser would only ever see the exact same Git repository as did the pkgsite.
But even when you trust GitHub, the module's domain is allowed to serve custom source links. If gonih.org/xzpoc decided to serve
<meta name="go-import" content="gonih.org/xzpoc mod https://gonih.org/xzpoc">
<meta name="go-import" content="gonih.org/xzpoc git https://github.com/Merovius/xzpoc">
<meta name="go-source" content="gonih.org/xzpoc _ https://github.com/Merovius/xzpoc_harmless/tree/v1.0.0{/dir} https://github.com/Merovius/xzpoc_harmless/blob/v1.0.0{/dir}/{file}#L{line}">
the pkgsite would have a hard time verifying the rendered HTML against sum.golang.org.
@jfrech I'm sorry, but I'm genuinely not sure how to engage with what you are saying.
The basic nature of what I'm reporting here is that it is possible for a module to contain malicious code, while everyone importing and using it will come to the reasonable conclusion that they get served the verified and vetted content of a well-known git repository. Given that we just experienced a supply chain attack that was (in part) based on exactly such a mechanism, it seems a reasonable question to pose whether we can do anything to make it more likely that such an attempt gets discovered. Especially given that we theoretically have the ability to determine automatically whether a given source tree was used to create a given module zip.
You are correct that there are other vectors - besides the one I demonstrated - for a malicious module to create such an illusion. But how does that change the fact, that pkg.go.dev
currently lends credibility with its links? You are correct that the sense of credibility may ultimately be false (in some cases, at least). But when I talked to Go programmers about this, the most common response was "but doesn't the checksum database catch this?", so would you argue that the sumdb also gives a "false sense of security" so we should abandon it? Of course not, that's ridiculous.
I think there are mechanisms we can come up with to improve the situation here. For example
mod
and a git
(or whatever) tag and whether the served contents match. Which would increase the difficulty of doing this fake - and if you find some community members who would run it from domestic IP addresses, you could increase it even further.But also, perhaps the Go team just feels there is no action needed here. That's their prerogative, of course. It still seems like a reasonable thing to ask.
FTR I think Go's situation here is vastly better than most other language's, because
The thing I'm proposing is to consider a way to put these pieces together. The way open source is usually vetted is by following the commit log. Currently, there exists no link from that to the checksum database. I'm proposing to consider whether we can perhaps build such a link - even if it is only limited to a probabilistic one.
What is the URL of the page with the issue?
https://pkg.go.dev/gonih.org/xzpoc
What is your user agent?
Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0
Screenshot
(not super relevant)
What did you do?
I served a
go-import
meta-tag with vcsmod
pointing at a personal proxy fromgonih.org/xzpoc
. The proxy is serving a valid, but "malicious" version of a package that contains aGreeting
function returning the string "gotcha pwned".It also contains a
go-import
meta-tag with vcsgit
pointing at a GitHub repo containing a module with the same package and API, but returning the stringHello world!
from that function. Thev1.0.0.info
file points at the "clean"v1.0.0
tag and commit, even though the code contained in thev1.0.0.zip
file does not match the content of that repository.Here is a playground link to demonstrate what happens when that package is imported.
For your convenience, this is what is served by my "infected" proxy:
xzpoc.tar.gz
What did you see happen?
pkg.go.dev
links to the "clean" version in the repository, both from the side-bar and in the individual links for the exported API, giving the impression that the module, when imported, will contain the "clean" code. But if anyone actually depends on that module, they will download and use the "malicious" version (regardless of whether they use the public mirror, or bypass it withGOPROXY=direct
, as both will download the module from my "infected" proxy). It is only by inspecting the downloaded code, that they can detect the "malware".What did you expect to see?
Some kind of indication, that the code in the repository does not accurately reflect the cached version.