x/pkgsite: links can point at source code that may not match what is served by the module proxy

Merovius commented 7 months ago

What is the URL of the page with the issue?

https://pkg.go.dev/gonih.org/xzpoc

What is your user agent?

Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0

Screenshot

(not super relevant)

Screenshot 2024-04-02 at 23-06-42 xzpoc package - gonih org_xzpoc - Go Packages

What did you do?

I served a go-import meta-tag with vcs mod pointing at a personal proxy from gonih.org/xzpoc. The proxy is serving a valid, but "malicious" version of a package that contains a Greeting function returning the string "gotcha pwned".

It also contains a go-import meta-tag with vcs git pointing at a GitHub repo containing a module with the same package and API, but returning the string Hello world! from that function. The v1.0.0.info file points at the "clean" v1.0.0 tag and commit, even though the code contained in the v1.0.0.zip file does not match the content of that repository.

Here is a playground link to demonstrate what happens when that package is imported.

For your convenience, this is what is served by my "infected" proxy:

xzpoc.tar.gz

What did you see happen?

pkg.go.dev links to the "clean" version in the repository, both from the side-bar and in the individual links for the exported API, giving the impression that the module, when imported, will contain the "clean" code. But if anyone actually depends on that module, they will download and use the "malicious" version (regardless of whether they use the public mirror, or bypass it with GOPROXY=direct, as both will download the module from my "infected" proxy). It is only by inspecting the downloaded code, that they can detect the "malware".

What did you expect to see?

Some kind of indication, that the code in the repository does not accurately reflect the cached version.

seankhliao commented 7 months ago

a variation may be: the proxy and pkgsite uses a clean tagged version, but the source repo later moves the tag. pkgsite source links use tags when available. this may only become visible if someone tries to build with GOPROXY=direct.

hyangah commented 7 months ago

Alternatively, should pkgsite use services like https://go-mod-viewer.appspot.com for source link, instead of attempting to link to the repository? go-mod-viewer.appspot.com does not offer rich UIs like VCS hosting sites, but it shows the source code as included in the module zip file.

cc @jba @rsc @golang/tools-team

jfrech commented 7 months ago

@hyangah Who owns the service you propose?

jfrech commented 7 months ago

@Merovius Could you not achieve the same by discriminating using pkgsite idiosyncrasies and serving everyone else but the pkgsite a malicious Git repository? Without an authoritative ground truth you trust, you can never be sure two requests are semantically compatible.

Thus, I think the indication you request is fundamentally impossible unless you install a browser plugin which is attached to sum.golang.org.

The real problem is that the malicious package is already the ground truth, since curl -fsSL sum.golang.org/lookup/... and local cat go.sum | grep -i ... do agree. What you are doing is in a sense a distraction campaign since you leverage GitHub's name to give credit to your non-malicious shadow package.

Merovius commented 7 months ago

Could you not achieve the same by discriminating using pkgsite idiosyncrasies and serving everyone else but the pkgsite a malicious Git repository?

pkgsite could link to a revision hash, instead of a reference. That way, it first verifies that a given commit ID refers to a tree that generates the same zip as served by the module mirror and then ties the link to that verified commit ID, guaranteeing that all links correspond to the code actually creating the cached module.

I'll also note that I'm not sure if and how pkgsite links to source, if the source is not on GitHub or another well-known hoster. For those, I think the risk of them doing those kinds of exploits is relatively low.

But yes, it's always going to be imperfect. That does not mean we can't improve it.

Merovius commented 7 months ago

FWIW another interesting response to this might be for go build, if it happens in a git repository checked out at a semantic versioned tag, to do a sumdb lookup against the contents of the repo and warn, if there is a mismatch. It's pretty limited in what it can catch, but it at least means that someone who cloned the repo to work on it (e.g. to do contribute) would potentially detect tampering.

jfrech commented 7 months ago

But yes, it's always going to be imperfect. That does not mean we can't improve it.

Pertaining to your original post "What did you expect to see? ; Some kind of indication, that the code in the repository does not accurately reflect the cached version.", I think such an indication would create a false sense of security due to what I wrote above.

jfrech commented 7 months ago

I'll also note that I'm not sure if and how pkgsite links to source, if the source is not on GitHub or another well-known hoster. For those, I think the risk of them doing those kinds of exploits is relatively low.

Cf. https://github.com/golang/go/issues/40477#issuecomment-764963287 [2024-04-06] Cf. https://github.com/golang/gddo/wiki/Source-Code-Links [2024-04-06] Cf. https://go.googlesource.com/pkgsite/+/master/internal/source/source.go#800 [2024-04-06]

jfrech commented 7 months ago

For those, I think the risk of them doing those kinds of exploits is relatively low.

They are not performing an exploit. Your domain "gonih.org" tries to spread confusion about which source is trusted by the Go build system (based upon sum.golang.org).

Merovius commented 7 months ago

Pertaining to your original post "What did you expect to see? […]", I think such an indication would create a false sense of security due to what I wrote above.

I'll note that I was bound by the issue-template, which required me to suggest a solution. Further, when reporting this privately, I was told that this is considered an issue of pkg.go.dev, further restricting the space from which I could suggest those.

I was really just reporting an issue, not trying to prescribe how to address that. I also don't have any particular limit in mind, of what is considered the root cause of that issue. I don't really care how or where it is addressed. Discussing that is the point of this issue.

So I acknowledge that you don't like my original suggestion.

For those, I think the risk of them doing those kinds of exploits is relatively low.

They are not performing an exploit.

No offense, but please do not ignore the context of the thing you are quoting. It was a response to

Could you not achieve the same by discriminating using pkgsite idiosyncrasies and serving everyone else but the pkgsite a malicious Git repository?

which is something only the code hosting site (in this case GitHub) can do and which they'd have to do maliciously.

jfrech commented 7 months ago

Ah, I formulated sloppily: What I was thinking of was your domain serving the pkgsite matching malicious module zip and malicious Git repository (so that they match) but everyone else malicious module zip and harmless Git repository. But I neglected to notice that the browser would only ever see the exact same Git repository as did the pkgsite.

jfrech commented 7 months ago

But even when you trust GitHub, the module's domain is allowed to serve custom source links. If gonih.org/xzpoc decided to serve

<meta name="go-import" content="gonih.org/xzpoc mod https://gonih.org/xzpoc">
<meta name="go-import" content="gonih.org/xzpoc git https://github.com/Merovius/xzpoc">
<meta name="go-source" content="gonih.org/xzpoc _ https://github.com/Merovius/xzpoc_harmless/tree/v1.0.0{/dir} https://github.com/Merovius/xzpoc_harmless/blob/v1.0.0{/dir}/{file}#L{line}">

the pkgsite would have a hard time verifying the rendered HTML against sum.golang.org.

Merovius commented 7 months ago

@jfrech I'm sorry, but I'm genuinely not sure how to engage with what you are saying.

The basic nature of what I'm reporting here is that it is possible for a module to contain malicious code, while everyone importing and using it will come to the reasonable conclusion that they get served the verified and vetted content of a well-known git repository. Given that we just experienced a supply chain attack that was (in part) based on exactly such a mechanism, it seems a reasonable question to pose whether we can do anything to make it more likely that such an attempt gets discovered. Especially given that we theoretically have the ability to determine automatically whether a given source tree was used to create a given module zip.

You are correct that there are other vectors - besides the one I demonstrated - for a malicious module to create such an illusion. But how does that change the fact, that pkg.go.dev currently lends credibility with its links? You are correct that the sense of credibility may ultimately be false (in some cases, at least). But when I talked to Go programmers about this, the most common response was "but doesn't the checksum database catch this?", so would you argue that the sumdb also gives a "false sense of security" so we should abandon it? Of course not, that's ridiculous.

I think there are mechanisms we can come up with to improve the situation here. For example

Including the hash in the source links would at least mean that for the most commonly used code hosters, there is a clear chain of trust from the sumdb to the shown source code. Other code hosters might not have that, but it's still a benefit if it only happens for some.
Calculating the hash on build and looking it up in the sumdb would increase the likelihood in at least some situations - which notably can't be circumvented with IP-based or other client-detection.
You could run a prober, that periodically checks the meta-tags served by vanity-domains for containing both a mod and a git (or whatever) tag and whether the served contents match. Which would increase the difficulty of doing this fake - and if you find some community members who would run it from domestic IP addresses, you could increase it even further.

But also, perhaps the Go team just feels there is no action needed here. That's their prerogative, of course. It still seems like a reasonable thing to ask.

Merovius commented 7 months ago

FTR I think Go's situation here is vastly better than most other language's, because

We have a temper-evident ledger of all canonical version hashes, meaning for malware to be included in a built, it actually has to be announced as the canonical version to the world, in a way that can be traced.
We have the Go vulnerability database so that when we discover malicious code in a public Go module, we have a way to accurately track its usage and for victims to automatically check if they are affected.
We have a byte-by-byte reproducible mapping from a source tree to a published module, so we can actually automatically check that a published version is identical to a vetted version.

The thing I'm proposing is to consider a way to put these pieces together. The way open source is usually vetted is by following the commit log. Currently, there exists no link from that to the checksum database. I'm proposing to consider whether we can perhaps build such a link - even if it is only limited to a probabilistic one.

golang / go