actions / attest-build-provenance

Action for generating build provenance attestations for workflow artifacts
MIT License
298 stars 229 forks source link

Failed to get ID token: error in secret or public key callback: socket hang up #156

Closed pealtrufo closed 1 month ago

pealtrufo commented 2 months ago

Hi,

I am trying to use this action in one of my self-hosted ARC runners, but I get this error: Error: Failed to get ID token: error in secret or public key callback: socket hang up

I have been checking the action code to see if I could figure out what could be causing the issue, but I am a little bit lost. It seems to be related to the action not being able to get an ID Token from GitHub, but I struggle to see why.

These are the permissions given to the workflow:

permissions:
      id-token: write
      contents: read
      attestations: write

This is how I am using the action in the workflow:

- name: Attest
   uses: actions/attest-build-provenance@v1
   id: attest
   with:
       subject-name: ${{ inputs.REGISTRY_SERVER }}/${{ inputs.APP_NAME }}
       subject-digest: ${{ steps.get-image-digest.outputs.imageDigest }}
       push-to-registry: true

I can also see that my runner has ACTIONS_ID_TOKEN_REQUEST_URL and ACTIONS_ID_TOKEN_REQUEST_TOKEN envs set, and that there seems to be connectivity from the runner to the URL specified in the env

sh-4.4$ curl -v https://pipelinesghubeus13.actions.githubusercontent.com/HF......
....
< HTTP/1.1 401 Unauthorized

401 response code is expected as I am not using the token in the curl request.

Any idea about what could be causing this issue?

PS: Other actions that make use of GH ID token work as expected. In the same workflow I am using aws-actions/configure-aws-credentials and that works perfectly fine.

Thanks in advance!

pealtrufo commented 2 months ago

Update: I have been investigating this issue a little bit more and have identified two other endpoints that the action hits to decode the ID Token received: https://token.actions.githubusercontent.com/.well-known/openid-configuration https://token.actions.githubusercontent.com/.well-known/jwks

First one is to get the jwks endpoint from GH OIDC configuration, and second one to get the appropriate public key based on the kid claim in the ID Token.

I have checked if my ARC runners have connectivity with those endpoints, and they do have it, as response code is 200 and I can see the response contents. But I have noticed this message when hitting https://token.actions.githubusercontent.com/.well-known/openid-configuration endpoint: The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.

See below an extract of the response to the curl command:

...
* TLSv1.3 (IN), TLS app data, [no content] (0):
{ [1 bytes data]
< HTTP/2 200 
< content-type: application/json
< date: Thu, 18 Jul 2024 13:38:30 GMT
< content-length: 1193
< x-github-backend: Kubernetes
< x-github-request-id: B62A:335912:361B30:452318:66991AD6
< server: github.com
< 
{ [1193 bytes data]

100  1193  100  1193    0     0   4696      0 --:--:-- --:--:-- --:--:--  4678
* Connection #0 to host proxy.org.corp left intact
The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.{
    "issuer": "https://token.actions.githubusercontent.com/",
    "jwks_uri": "https://token.actions.githubusercontent.com/.well-known/jwks",
    "subject_types_supported": [
        "public",
        "pairwise"
    ],
    "response_types_supported": [
        "id_token"
    ],
    "claims_supported": [
        "sub",
...

I can't make sense of that so I will appreciate any help to understand what is going on here. I suppose there's some configuration missing in my self-hosted runner as I got this same thing working ok in an ubuntu runner, but I don't know what is it. The error message definitely suggests there's an issue retrieving the public key to decode the token.

Thanks

bdehamer commented 2 months ago

Are these runners accessing the GH endpoints through a proxy? I noticed "proxy.org.corp" in your curl output. I'm wondering if this is a case of some of these requests not properly detecting/using the proxy to make these requests.

pealtrufo commented 2 months ago

Yes, there is a proxy configured in the runner. When using curl to send the request, it goes through the proxy by default and I can see the response as in my comment above. I assumed that the action also goes with the proxy by default.

bdehamer commented 2 months ago

At the moment, there are two different clients used to retrieve the JWKS key -- one which gets the initial openid-configuration endpoint and another which then pulls the jwks endpoint:

https://github.com/actions/toolkit/blob/1db73622df8a649a01cb12b8a217c903419009fa/packages/attest/src/oidc.ts#L71-L85

I know that the client used for the first request is set-up to properly read proxy info from the environment, but I have a suspicion that the client in the jwks library is NOT 🤦 .

I'm going to do some testing and see if I can get to the bottom of this.

rasad4468 commented 2 months ago

The error you're encountering typically indicates that there's an issue with the action trying to retrieve an ID token from GitHub, possibly due to network issues or misconfiguration. Here are some steps to help you diagnose and resolve the issue:

  1. Check GitHub Actions Configuration Ensure your GitHub Actions workflow is correctly set up to request an ID token. Your workflow should include the id-token permission. Here’s an example of how to configure it in your workflow file:

name: Example Workflow

on: push: branches:

jobs: example-job: runs-on: self-hosted permissions: id-token: write # Ensure this permission is set

steps:
- name: Checkout code
  uses: actions/checkout@v2

- name: Use the action
  uses: <your-action>
  with:
    # your action inputs
  1. Verify Runner Configuration Ensure your self-hosted runner is correctly configured and has the necessary permissions to communicate with GitHub's servers. Check the runner's status in the GitHub repository settings to ensure it's online and available.

  2. Network Configuration Network issues could cause the "socket hang up" error. Ensure your runner has internet access and can reach GitHub’s servers. You might need to check firewall settings or proxy configurations that could be blocking the request.

  3. Action Logs Inspect the logs for the specific action to get more details about where the failure occurs. This can provide more context on whether the issue is with network connectivity, authentication, or another aspect.

  4. Update Runner Software Ensure your self-hosted runner is using the latest version of the runner software. GitHub frequently updates the runner software, and using an outdated version might cause compatibility issues.

  5. Secrets Configuration Double-check the secrets configuration. Ensure any required secrets are correctly configured in the GitHub repository settings and accessible to the workflow.

  6. Debugging Steps Add debugging steps to your workflow to gather more information. For example, you can add steps to output environment variables or run network diagnostics.

Example Debugging Steps:

steps:
- name: Checkout code
  uses: actions/checkout@v2

- name: Debug environment variables
  run: env

- name: Network diagnostics
  run: |
    curl -v https://github.com
    ping -c 4 github.com

- name: Use the action
  uses: <your-action>
  with:
    # your action inputs

Additional Resources GitHub Documentation on ID Token GitHub Actions Runner Documentation

Thanks rasad4468

pealtrufo commented 2 months ago

Thanks @rasad4468

As per my comment above, the workflow has the right permission applied to retrieve the ID Token, and I have also checked that there's connectivity between the runner and GitHub endpoints to retrieve the ID Token and OIDC configuration.

@bdehamer comment above also suggests that there might be an issue with one of the clients the action uses internally to retrieve the JWKS key, where it doesn't properly read proxy info from the environment, potentially causing this issue. Hopefully @bdehamer will confirm if this is the case or not to figure out the way forward.

bdehamer commented 1 month ago

@pealtrufo all of the logic in the attest-build-provenance action is running under node which I believe uses it own store of trusted certificates. If your proxy is handling HTTPS traffic it may be using a self-signed certificate (or something which chains to a root not in the node's trusted list).

Unfortunately, using curl to troubleshoot in this scenario isn't particularly helpful since it's likely drawing on a different store of trusted certificates.

I do NOT recommend this as a long-term fix, but it may be useful to set the following environment variable in your runner just to see if it allows the job to progress further:

NODE_TLS_REJECT_UNAUTHORIZED=0

If the issue truly is nodejs not trusting your proxy server's certificate, this option will cause it to ignore the error.

pealtrufo commented 1 month ago

@bdehamer Thanks for your support with this issue.

I have tested your suggested workaround, but I am still experiencing the same issue:

image

I am really confused about what could be causing this. When you said that perhaps the client used by the action didn't properly read proxy info from environment, I thought that would be it. It made sense. Is this not the case then?

PS: When I look to how other actions that retrieve and make use of the GH ID Token work in the runner, I can see that aws-actions/configure-aws-credentials for instance, which also works under node, writes this in the output: "Setting proxy from environment", and everything works as expected.

Kind regards

bdehamer commented 1 month ago

@pealtrufo I was able to verify that the code we're using to fetch the OIDC token signing key is definitely NOT respecting the proxy settings.

I'm going to work on a fix and will let you know when I have a new version for you to test.

bdehamer commented 1 month ago

Version 1.4.0 of the action contains a fix for the JWKS proxy issue which I think may explain (at least part of) the issue you're seeing.

pealtrufo commented 1 month ago

Hi @bdehamer. Issue has been resolved 🚀

It is now failing when getting the signing certificate from GH's Fulcio instance, this time due to the proxy not allowing the connection. I will get that sorted out and hopefully everything else will work as expected. Thanks!