Open maciej-gol opened 1 year ago
To clarify my understanding:
gitlab-ci-token
shouldn't matter... dependabot-script
? Because on GitHub for additional security we run within a custom proxy that prevent dependabot-core
from ever seeing the secrets, so I'd like to eliminate that as a potential culprit.This sure sounds like a legit bug, but I suspect this only impacts terraform repos hosted on Gitlab, so given the relative impact here this may not get prioritized for a bit. However, if you want to dig into it further using the dry-run script (or even your current entrypoint) and then put together a PR I'd be happy to guide you / answer any questions. You can drop binding.b
anywhere in the dependabot-core
code to start irb
.
@jeffwidman
this should manifest against public repo's as well... the inclusion of the gitlab-ci-token shouldn't matter... Yes, it doesn't matter. What matters is the
.git
part at the end of the clone URL.this should manifest whenever the dependency's repo is on gitlab / a service that doesn't auto-redirect... it would happen regardless of whether Dependabot running in gitlab vs github vs Azure DO... No. Git's HTTP protocol handles redirections well. This issue should manifest whenever your GitLab (or anything else) installation is behind a firewall that enforces authorization on every url (before the traffic reaches GitLab).
In our case, we have allowed traffic to urls containing .git
in their path. I have just double-checked what's happening in case of GitHub and GitLab and here are the results:
GitHub serves the content directly from both endpoints:
$ curl https://github.com/dependabot/dependabot-core/info/refs?service=git-upload-pack
001e# service=git-upload-pack
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
0000
$ curl https://github.com/dependabot/dependabot-core.git/info/refs?service=git-upload-pack
001e# service=git-upload-pack
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
0000
In both cases, a proper git pack is being sent back.
In case of GitLab, if you clone using the url without .git
, you will get a redirection to the url with .git
.
$ curl https://gitlab.com/maciej.gol/op-askpass.git/info/refs?service=git-upload-pack
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
$ curl https://gitlab.com/maciej.gol/op-askpass/info/refs?service=git-upload-pack
<html><body>You are being <a href="https://gitlab.com/maciej.gol/op-askpass.git/info/refs?service=git-upload-pack">redirected</a>.</body></html>
On both GitHub and GitLab, the UI shows URLs with .git
as valid clone urls, thus it might also be suprising ones without .git
work.
Now, our GitLab instance is self-hosted, and we whitelist urls containing .git
from our SSO authorization, which allows cloning repositories via HTTPS. Unfortunately, links without .git
will be harder to whitelist, because there is no distinct pattern there.
You're running this on Gitlab, via an entrypoint like dependabot-script? Because on GitHub for additional security we run within a custom proxy that prevent dependabot-core from ever seeing the secrets, so I'd like to eliminate that as a potential culprit.
I've been using dependabot-standalone which is a stateless version of dependabot, that can be run in CI.
The culprit is in the DependabotSource
and FileFetcher
classes, as the former's .url
emits one without .git
, and the latter uses source.url
to perform the clone command.
Mainly here:
url
is being constructedsource.url
is being used to construct the git commandDependabotSource
is being createdThus, I believe this issue/improvement belongs to dependabot-core
, unless you tell me I'm wrong :D
The Dependabot::Source
object is being created like this:
Dependabot::Source.new(
provider: "gitlab",
hostname: AppConfig.gitlab_url.host,
api_endpoint: "#{AppConfig.gitlab_url}/api/v4",
repo: repo,
directory: directory,
branch: branch
)
And I don't really see any place here that could be adapted to use .git
, thus I believe this issue suits more in dependabot-core
(as it's tied to the core object/functionality)
Did you have any chance to look into this?
@maciej-gol It seems that you digged into quite a lot, I think you're close to find the proper fix for us to include in dependabot-core. Did you try to see if anything breaks if we started using .git
-ended url's as the canonical Source#url
?
Is there an existing issue for this?
Package ecosystem
terraform
Package manager version
1.2.9
Language version
1.2.9
Manifest location and content before the Dependabot update
No response
dependabot.yml content
version: 2 updates:
registries: terraform-gitlab: type: terraform-registry url: ... token: ${{CI_JOB_TOKEN}}
Updated dependency
No response
What you expected to see, versus what you actually saw
What I've seen:
After modifying the script to print the error's message in
clone_repo_contents
, I get:This is because our GitLab instance is configured to require authorization. It allows traffic to
*.git/*
, though.Expected behavior:
The repository's clone url should be used, as presented both by GitHub and GitLab. The clone url ends with
.git
. Instead, dependabot depends on a redirection fromgitlab.something.net/team/service
togitlab.something.net/team/service.git
.In our case, it won't easily work, because we can't easily craft a pattern that would detect whether a request is legitimate clone or just somebody random accessing our GitLab.
Attempting to clone both urls from our CI:
Native package manager behavior
No response
Images of the diff or a link to the PR, issue, or logs
No response
Smallest manifest that reproduces the issue
No response