dependabot / dependabot-core

🤖 Dependabot's core logic for creating update PRs.
https://docs.github.com/en/code-security/dependabot
MIT License
4.55k stars 950 forks source link

Repositories fetched over HTTPS rely on redirection instead of using direct clone link #6232

Open maciej-gol opened 1 year ago

maciej-gol commented 1 year ago

Is there an existing issue for this?

Package ecosystem

terraform

Package manager version

1.2.9

Language version

1.2.9

Manifest location and content before the Dependabot update

No response

dependabot.yml content

version: 2 updates:

registries: terraform-gitlab: type: terraform-registry url: ... token: ${{CI_JOB_TOKEN}}

Updated dependency

No response

What you expected to see, versus what you actually saw

What I've seen:

[2022-11-29 13:57:17 +0000] ERROR -- [dep-update: team/service=>terraform=>/infrastructure/shared_modules/cicd] Dependency update execution failed with error:
[2022-11-29 13:57:17 +0000] ERROR -- [dep-update: team/service=>terraform=>/infrastructure/shared_modules/cicd] Dependabot::RepoNotFound
[2022-11-29 13:57:17 +0000] DEBUG -- [dep-update: team/service=>terraform=>/infrastructure/shared_modules/cicd] /home/dependabot/app/vendor/bundle/ruby/3.1.0/gems/dependabot-common-0.213.0/lib/dependabot/file_fetchers/base.rb:95:in `rescue in clone_repo_contents'

After modifying the script to print the error's message in clone_repo_contents, I get:

fatal: unable to update url base from redirection:
  asked for: https://x-access-token:xxx@host/team/service/info/refs?service=git-upload-pack
   redirect: https://console.jumpcloud.com/login?...

This is because our GitLab instance is configured to require authorization. It allows traffic to *.git/*, though.

Expected behavior:

The repository's clone url should be used, as presented both by GitHub and GitLab. The clone url ends with .git. Instead, dependabot depends on a redirection from gitlab.something.net/team/service to gitlab.something.net/team/service.git.

In our case, it won't easily work, because we can't easily craft a pattern that would detect whether a request is legitimate clone or just somebody random accessing our GitLab.

Attempting to clone both urls from our CI:

$ git clone https://gitlab-ci-token:$CI_JOB_TOKEN@gitlab.host.net/team/service.git service
Cloning into 'service'...
$ git clone https://gitlab-ci-token:$CI_JOB_TOKEN@gitlab.host.net/team/service service2
Cloning into 'service2'...
fatal: unable to update url base from redirection:
  asked for: https://gitlab-ci-token:[MASKED]@gitlab.host.net/team/service/info/refs?service=git-upload-pack
   redirect: https://console.jumpcloud.com/login?...

Native package manager behavior

No response

Images of the diff or a link to the PR, issue, or logs

No response

Smallest manifest that reproduces the issue

No response

jeffwidman commented 1 year ago

To clarify my understanding:

  1. this should manifest against public repo's as well... the inclusion of the gitlab-ci-token shouldn't matter...
  2. this should manifest whenever the dependency's repo is on gitlab / a service that doesn't auto-redirect... it would happen regardless of whether Dependabot running in gitlab vs github vs Azure DO... Is both of ☝️ correct?
  3. You're running this on Gitlab, via an entrypoint like dependabot-script? Because on GitHub for additional security we run within a custom proxy that prevent dependabot-core from ever seeing the secrets, so I'd like to eliminate that as a potential culprit.

This sure sounds like a legit bug, but I suspect this only impacts terraform repos hosted on Gitlab, so given the relative impact here this may not get prioritized for a bit. However, if you want to dig into it further using the dry-run script (or even your current entrypoint) and then put together a PR I'd be happy to guide you / answer any questions. You can drop binding.b anywhere in the dependabot-core code to start irb.

maciej-gol commented 1 year ago

@jeffwidman

this should manifest against public repo's as well... the inclusion of the gitlab-ci-token shouldn't matter... Yes, it doesn't matter. What matters is the .git part at the end of the clone URL.

this should manifest whenever the dependency's repo is on gitlab / a service that doesn't auto-redirect... it would happen regardless of whether Dependabot running in gitlab vs github vs Azure DO... No. Git's HTTP protocol handles redirections well. This issue should manifest whenever your GitLab (or anything else) installation is behind a firewall that enforces authorization on every url (before the traffic reaches GitLab).

In our case, we have allowed traffic to urls containing .git in their path. I have just double-checked what's happening in case of GitHub and GitLab and here are the results:

GitHub serves the content directly from both endpoints:

$ curl https://github.com/dependabot/dependabot-core/info/refs?service=git-upload-pack
001e# service=git-upload-pack
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
0000

$ curl https://github.com/dependabot/dependabot-core.git/info/refs?service=git-upload-pack
001e# service=git-upload-pack
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
0000

In both cases, a proper git pack is being sent back.

In case of GitLab, if you clone using the url without .git, you will get a redirection to the url with .git.

$ curl https://gitlab.com/maciej.gol/op-askpass.git/info/refs?service=git-upload-pack
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

$ curl https://gitlab.com/maciej.gol/op-askpass/info/refs?service=git-upload-pack
<html><body>You are being <a href="https://gitlab.com/maciej.gol/op-askpass.git/info/refs?service=git-upload-pack">redirected</a>.</body></html>

On both GitHub and GitLab, the UI shows URLs with .git as valid clone urls, thus it might also be suprising ones without .git work.

Now, our GitLab instance is self-hosted, and we whitelist urls containing .git from our SSO authorization, which allows cloning repositories via HTTPS. Unfortunately, links without .git will be harder to whitelist, because there is no distinct pattern there.

You're running this on Gitlab, via an entrypoint like dependabot-script? Because on GitHub for additional security we run within a custom proxy that prevent dependabot-core from ever seeing the secrets, so I'd like to eliminate that as a potential culprit.

I've been using dependabot-standalone which is a stateless version of dependabot, that can be run in CI. The culprit is in the DependabotSource and FileFetcher classes, as the former's .url emits one without .git, and the latter uses source.url to perform the clone command.

Mainly here:

Thus, I believe this issue/improvement belongs to dependabot-core, unless you tell me I'm wrong :D

The Dependabot::Source object is being created like this:

      Dependabot::Source.new(
        provider: "gitlab",
        hostname: AppConfig.gitlab_url.host,
        api_endpoint: "#{AppConfig.gitlab_url}/api/v4",
        repo: repo,
        directory: directory,
        branch: branch
      )

And I don't really see any place here that could be adapted to use .git, thus I believe this issue suits more in dependabot-core (as it's tied to the core object/functionality)

maciej-gol commented 1 year ago

Did you have any chance to look into this?

deivid-rodriguez commented 1 year ago

@maciej-gol It seems that you digged into quite a lot, I think you're close to find the proper fix for us to include in dependabot-core. Did you try to see if anything breaks if we started using .git-ended url's as the canonical Source#url?