1Password / load-secrets-action

Load secrets from 1Password into your GitHub Actions jobs
https://developer.1password.com
MIT License
197 stars 27 forks source link

Load Secrets Randomly Failing with dial TCP xxx.xxx.xxx.xxx:443 connect: connection refused #38

Open mattp0 opened 1 year ago

mattp0 commented 1 year ago

Hello! So far have been loving the 1password integration for GH Actions. I am running into an issue with the the action where is returns a connection refused error which is often just resolved if the workflow is re-run.

GH actions relevant sections of the file:

name: Deploy

on:
  push:
    branches:
      - '*-PR-*'

  pull_request:
    branches:
    - master
    types: [opened, synchronize]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Configure 1Password Connect
        id: onepassword-connnection
        uses: 1password/load-secrets-action/configure@v1
        with:
          connect-host: ${{ secrets.OP_HOST }}
          connect-token: ${{ secrets.OP_CONNECT_TOKEN }}

      - name: Load secret
        id: load-secrets
        uses: 1password/load-secrets-action@v1
        env:
          THING: op://***********/THING/STUFF

While using the load secrets action after setup configure action, I am seeing seemingly random failures, often resolved by re-running the workflow.. My network team said that they are not seeing any denials at the firewall. Here is the complete error.

/usr/bin/sh -c /home/runner/work/_actions/1password/load-secrets-action/v1/entrypoint.sh
Authenticated with CONNECT 
Archive:  op.zip
 extracting: /usr/local/bin/op.sig   
  inflating: /usr/local/bin/op       
Populating variable: THING
Error:  2023/04/10 17:33:38 could not read secret op://***********/THING/STUFF: Get "***/v1/vaults/***********": dial tcp 137.229.86.209:4: connect: connection refused
Error: The process '/usr/bin/sh' failed with exit code 1

I tried running the logs in debug but it seems to be originating from the op read command based on the entrypoint.sh, but not sure if that source code is available. It could be this? https://github.com/1Password/onepassword-operator. Which I found reference of similar behavior https://github.com/1Password/onepassword-operator/issues/132 which redirect back to the connect repo https://github.com/1Password/connect/issues/45 but these errors do not look exactly the same?

I have also seen this error happen but much rarer

2023/04/08 00:27:27 could not read secret op://***********/THING/STUFF: Get "***/v1/vaults/***********": dial tcp 137.229.86.209:443: i/o timeout

Once again these errors only occur 1-2 times then it works correctly, seemingly happening more frequently on cold starts. We are self hosting a connector and we are able to see that is does grab the secret correctly and the only error is: the E400 Invalid Item UUID`

Matching logs from inside the connector the errors:

{"log_message":"(I) GET /v1/vaults/***********/items?filter=title+eq+%22THING%22","timestamp":"2023-04-07T21:21:10.855356368Z","level":3,"scope":{"request_id":"e091e450-941e-4623-9de8-2567e5bde0c9"}}

{"log_message":"(I) GET /v1/vaults/***********/items?filter=title+eq+%22THING%22completed (200: OK)","timestamp":"2023-04-07T21:21:10.859412096Z","level":3,"scope":{"request_id":"572bfc701c0b3a0ea99a32b7a9ec26d5","jti":"ksveqvt5kubby747u5b7dd6zki"}}

{"log_message":"(E) 400: Invalid Item UUID","timestamp":"2023-04-07T21:21:10.859493954Z","level":1,"scope":{"request_id":"d12eea19a35ab297960816825149e80c","jti":"ksveqvt5kubby747u5b7dd6zki"}}

This maybe better placed on the connector, but with the lack of traction on the issue listed, maybe adding a retry option on failure as it seems to resolve after a retry?

If others find this and it is still an issue, I am using a hack bash sleep for 1-2mins before trying to load the secrets again. Helps with retries and works sometimes!

dustin-ruetz commented 1 year ago

Hi @mattp0! 👋 Thank you very much for filing this issue and for including your configuration file and the relevant logs. We have seen some customers raising similar issues regarding the (E) 400: Invalid Item UUID Connect errors, but we are still in the process of investigating the root cause.

reweeden commented 10 months ago

@dustin-ruetz Any updates on this?