Open jduan-highnote opened 1 year ago
Apologies, I missed this issue being raised
Yes typically this would indicate that the run fails to start, do you find that the run is starting and this action is failing to get the ID?
Mitigate in which sense? Are you using this action to simply kick off another action and are not concerned for the ID? Or?
What I see often is that a run was successfully triggered (the "Distinct ID" matches the expected ID) but this action couldn't find it after many tries.
Hmmm I wonder if the rework i'm doing for v2 will help. In the interim, can you please post the full log with debug logging enabled?
You can enable this by adding a secret ACTIONS_RUNNER_DEBUG
with true
(info on this here https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging)
Good idea. I just added that secret. Will keep you posted. Thanks!
Happy to re-open this should there still be any issues, let me know :)
So this happened today.
It failed to find a workflow run with the matching dispatch id:
However, the workflow run was dispatched successfully (the id matches)
How long is it between the job starting and the action dispatching? do you use custom runners or gh runners?
The action was dispatched at Thu, 14 Sep 2023 04:44:27 GMT
and the job started running at Thu, 14 Sep 2023 04:44:35 GMT
so it took 8 seconds.
We use self-hosted runners for this job. We used to use self-hosted runners and had this problem as well.
Weird, does the output for return-dispatch to catch happen before anything else?
Sorry what you mean exactly?
Ah sorry, didn't explain myself very well, does this step happen before any other steps?
- name: echo distinct ID ${{ github.event.inputs.distinct_id }}
run: echo ${{ github.event.inputs.distinct_id }}
No, that job only has one step like this:
id:
runs-on: [infra, us-central1, build-small]
if: ${{ github.event.inputs.distinct_id != null }}
steps:
- name: Distinct ID ${{ github.event.inputs.distinct_id }}
run: echo ${{ github.event.inputs.distinct_id }}
@jduan-highnote Could it be because your targeted repository triggers a lot of workflows on regular basis? I see that the API call to get the workflow runs only fetches 5 when a branch name is specified as ref
, and only 10 otherwise:
If more than 5 workflows have been triggered at the same time, then your run might never be in the first page of results and therefore impossible to find. It would be great to increase the per_page
value or make it configurable.
@pmrotule that's great insight. It's certainly possible that the target workflow has more than 5 concurrent runs at some point. It'd be great to make this a configurable param! @Codex-
@jduan-highnote I've cut a new release with some enhancements courtesy of @pmrotule
Let me know how it goes (v1 tag has been updated so if you're using the v1 tag you shouldn't have to change anything)
@Codex- can you given a summary of the recent enhancements? We do use v1
across the board so we will pick up the enhancements automatically.
An octokit iterator is now being used to check for runs across multiple API response pages
Nice! I'll keep you posted.
An octokit iterator is now being used to check for runs across multiple API response pages
@Codex- The iterator is only used to get the workflow id within getWorkflowId
, but getWorkflowRunIds
is still only looking at the first 5 runs:
So I think that my assumption from last month is still valid:
If more than 5 workflows have been triggered at the same time, then your run might never be in the first page of results and therefore impossible to find.
A nice way to solve this would be to use an iterator within getWorkflowRunIds
as well, but with a limit since going through all workflow runs in the history of a repository could take a very long time. So something like this:
const workflowRunIterator = octokit.paginate.iterator(
octokit.rest.actions.listWorkflowRuns,
{
owner: config.owner,
repo: config.repo,
workflow_id: workflowId,
per_page: 20,
},
);
let runIds: number[] = [];
let page = 1;
const maxPage = 5;
for await (const response of workflowRunIterator) {
if (response.status !== 200) {
throw new Error(
`Failed to get Workflow runs, expected 200 but received ${response.status}`,
);
}
// wrong type definition
const runs: typeof response.data.workflow_runs = response.data;
runIds = runs.map((workflowRun) => workflowRun.id);
if (runIds.length || page === maxPage) {
break;
}
page++;
}
I am also running into this issue:
Successfully dispatched workflow: Repository: Branch: Workflow ID: ***.yml Distinct ID: 348a1190-04f9-4c90-b5e5-ef3608dd4265 Attempt to extract run ID from steps... Exhausted searching IDs in known runs, attempt 1... Exhausted searching IDs in known runs, attempt 2... Exhausted searching IDs in known runs, attempt 3... Exhausted searching IDs in known runs, attempt 4... Exhausted searching IDs in known runs, attempt 5... Exhausted searching IDs in known runs, attempt 6... Exhausted searching IDs in known runs, attempt 7... Exhausted searching IDs in known runs, attempt 8... Exhausted searching IDs in known runs, attempt 9... Exhausted searching IDs in known runs, attempt 10... Error: getWorkflowRunJobs: An unexpected error has occurred: Server Error Error: Failed to complete: Server Error Warning: Does the token have the correct permissions? Error: Server Error
The weirdest thing is that it will keep attempting to find the ID while the successfully triggered pipeline is running and then hits the Server Error or timeout error once the pipeline is done. If there's anything I can add to help with troubleshooting, please let me know. Thank you for working on this, this functionality should be included in Github Actions by default.
@ChrisPage-AT When the search is being exhausted, if you do an API request yourself, is the run present?
Thank you for working on this, this functionality should be included in Github Actions by default.
Totally agree, really hoping we see it one day
@Codex- Using the api calls listed in your readme, I can find the triggered pipeline with List Workflow Runs
and use the id to see the jobs with List Jobs for a workflow run
. However I am noticing that neither of those calls seem to list the distinct_id that I'm passing in. Are you doing a different call to obtain that?
I started seeing this error recently. You can see that the workflow was dispatched successfully and the ID matches.
Server error? Unusual, can you enable debug logging? https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging
Can hopefully get some more info there
Interesting. I wasn't aware of enabling debug logs for runners. Should I enable both runner & step debug logs?
I think step but I'd do both to get a full picture
I enabled debug log a few weeks ago and today the problem happened for the first time. Please see the attached logs.
Thanks for that @jduan-highnote , I'll have a dig when I get a chance. Really odd that the ID being returned is so short, almost looks truncated.
I've deleted your post so as to not reveal anything about your infra
Thank you for being so thoughtful!
Here's another occurrence if it helps:
Fri, 26 Apr 2024 19:25:45 GMT
Run codex-/return-dispatch@v1
Fri, 26 Apr 2024 19:25:45 GMT Fetching Workflow ID for prbox_update_routing.yaml...
Fri, 26 Apr 2024 19:25:45 GMT Fetched Workflow ID: 88382506
Fri, 26 Apr 2024 19:25:46 GMT Successfully dispatched workflow:
Fri, 26 Apr 2024 19:25:46 GMT Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT Branch: main
Fri, 26 Apr 2024 19:25:46 GMT Workflow ID: prbox_update_routing.yaml
Fri, 26 Apr 2024 19:25:46 GMT Workflow Inputs: {"services":"non-pci","tenants":"test"}
Fri, 26 Apr 2024 19:25:46 GMT Distinct ID: f9c198f0-84b8-4943-a7cf-31be5bcec42b
Fri, 26 Apr 2024 19:25:46 GMT Attempt to extract run ID from steps...
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Attempting to fetch Run IDs for Workflow ID 88382506
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]failed to get branch for ref: main, please raise an issue with this git ref.
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Fetched Workflow Runs:
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Branch: undefined
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Workflow ID: 88382506
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Runs Fetched: [8852916369,8852872434,8852351822,8852314544,8852302751,8852270907,8851776517,8851726747,8851694157,8851681964]
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Attempting to get step names for Run IDs: [8852916369,8852872434,8852351822,8852314544,8852302751,8852270907,8851776517,8851726747,8851694157,8851681964]
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Workflow Run ID: 8852916369
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Jobs Fetched: [24312583511,24312583707,24312583931,24312591506,24312619999,24312642954] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID 61897684-539d-4539-b241-54ec6159841e,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:46 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Workflow Run ID: 8852872434
Fri, 26 Apr 2024 19:25:46 GMT ##[debug] Jobs Fetched: [24312437185,24312437385,24312437629,24312444058,24312464693,24312481469] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID fcc9476c-2cae-43a1-8850-ae2df9728507,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:47 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Workflow Run ID: 8852351822
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Jobs Fetched: [24310882803,24310883124,24310883576,24310892691,24310930047,24310957296] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID 0fec8571-ebc2-49f5-9364-747a1b6d3649,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:47 GMT ##[debug]Fetched Workflow Run Job Steps:
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Repository: bay1inc/devops
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Workflow Run ID: 8852314544
Fri, 26 Apr 2024 19:25:47 GMT ##[debug] Jobs Fetched: [24310757841,24310757992,24310758179,24310764698,24310791043,24310814079] Steps Fetched: [Set up job,Print Input Values,Clone Devops,Generate Matrix JSON,Post Clone Devops,Complete job,Distinct ID 6822826a-513e-4f60-8535-fe7573de521e,Set up runner,Setup K8s auth,Deploying Apps,Complete runner]
Fri, 26 Apr 2024 19:25:57 GMT Error: getWorkflowRunJobs: An unexpected error has occurred: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]HttpError: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:19832:26
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async getWorkflowRunJobSteps (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23186:22)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async run (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23269:25)
Fri, 26 Apr 2024 19:25:57 GMT Error: Failed to complete: Server Error
Fri, 26 Apr 2024 19:25:57 GMT Warning: Does the token have the correct permissions?
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]HttpError: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:19832:26
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async getWorkflowRunJobSteps (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23186:22)
Fri, 26 Apr 2024 19:25:57 GMT ##[debug] at async run (file:///runner/_work/_actions/codex-/return-dispatch/v1/dist/index.mjs:23269:25)
Fri, 26 Apr 2024 19:25:57 GMT Error: Server Error
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]Node Action run completed with exit code 1
Fri, 26 Apr 2024 19:25:57 GMT ##[debug]Finishing: Trigger routing updates
@Codex- do you have enough logs for debugging? I'd like to turn off debug logs for our github actions.
@jduan-highnote Are you sure your PAT has the required permissions : https://github.com/Codex-/return-dispatch?tab=readme-ov-file#permissions-required (as it's stated in the logs)
Fri, 26 Apr 2024 19:25:57 GMT Warning: Does the token have the correct permissions?
Yes it does. I only see this failure < 1% of the time.
On Sat, May 4, 2024 at 8:50 AM Stephdotnet @.***> wrote:
@jduan-highnote https://github.com/jduan-highnote Are you sure your PAT has the required permissions : https://github.com/Codex-/return-dispatch?tab=readme-ov-file#permissions-required (as it's stated in the logs)
Fri, 26 Apr 2024 19:25:57 GMT Warning: Does the token have the correct permissions?
— Reply to this email directly, view it on GitHub https://github.com/Codex-/return-dispatch/issues/183#issuecomment-2094271028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV6IUATQIFEC2VIJRHDZAZTZAT7T7AVCNFSM6AAAAAA3HIFJAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGI3TCMBSHA . You are receiving this because you were mentioned.Message ID: @.***>
@Codex- did you get a chance to look into this? I got another failure and I confirmed the dispatch actually worked (the ID matched)
I've been looking on and off but it seems as though there are just bad results from the API itself, looking to add some additional debug logging to see what the API is returning because something isn't right and it would either always work or never work, not just somehow end up with the truncated ID 🤔
An unexpected error has occurred: Server Error
When this error happens, can the code retries it a few times?
@Codex- did you get a chance to look more into this? Thanks!
I found another occurrence of this problem yesterday:
Attempt to extract run ID from steps...
Error: getWorkflowRunJobs: An unexpected error has occurred: Server Error
Error: Failed to complete: Server Error
Warning: Does the token have the correct permissions?
Error: Server Error
@Codex- did you get a chance to look more into this? Thanks!
I found another occurrence of this problem yesterday:
Attempt to extract run ID from steps... Error: getWorkflowRunJobs: An unexpected error has occurred: Server Error Error: Failed to complete: Server Error Warning: Does the token have the correct permissions? Error: Server Error
@jduan-highnote server errors have been more common recently with some github issues that have been happening off and on.
I've made some improvements to how I handle this, they have not yet been released but can be tested as the bundle has been generated and pushed for the changes:
uses: codex-/return-dispatch@feat/improve_retry_strategy
Yes, the number of incidents github.com has is crazy to say the least. How do I try out this branch in our github action workflows?
You should be able to update your workflow uses
field for that step to point to the branch: uses: codex-/return-dispatch@feat/improve_retry_strategy
Got it. BTW, we have metrics on how often this error happens. This is the past data. It doesn't happen often. I'll switch to @feat/improve_retry_strategy
but it may take a while before I can see how effective it is.
Server Error
s are now captured and retried for n
times so it'll be interested to see if that'll be enough to keep it going with more stability
@jduan-highnote how's it looking so far? :)
I merged the change to try this out on 8/19 and so far there's been no issues. That said, I think we need to wait a bit longer since the problem was happening about once a month before.
Hi,
From time to time, we would see this error "Exhausted searching IDs in known runs". See the logs below. I bet this is caused by github API not finding the relevant run. That said, is there anything we can do to mitigate this problem? Thanks!