actions / actions-runner-controller

Kubernetes controller for GitHub Actions self-hosted runners
Apache License 2.0
4.77k stars 1.13k forks source link

Can't display full logs in Github Actions UI after refreshing the page #3406

Open iliasnaamane2 opened 8 months ago

iliasnaamane2 commented 8 months ago

Checks

Controller Version

0.23.7

Deployment Method

Helm

Checks

To Reproduce

Update values.yml with the right PAT of Github.
Install actions-runner-controller using helm chart.
Deploy RunnerDeployment provided above.
Run an example of Github workflow.
The Github workflow is executed and logs disappear from UI after refreshing the page

Describe the bug

Logs disappear from Github Actions UI after refreshing the workflow page

Describe the expected behavior

Logs disappear from Github Actions UI after refreshing the workflow page

Additional Context

After refreshing the Github actions UI page, we cannot view full logs

Controller Logs

2024-03-05T02:41:26Z    INFO    Using default AutoscalingListener image pull policy     {"ImagePullPolicy": "IfNotPresent"}
2024-03-05T02:41:26Z    INFO    AutoscalingListener logging parameters changed  {"LogLevel": "debug", "LogFormat": "text"}
2024-03-05T02:41:27Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-03-05T02:41:27Z    INFO    Initializing actions-runner-controller  {"version": "v0.27.6", "default-scale-down-delay": "10m0s", "sync-period": "1m0s", "default-runner-image": "summerwind/actions-runner:latest", "default-docker-image": "docker:dind", "default-docker-gid": "1001", "common-runnner-labels": null, "leader-election-enabled": true, "leader-election-id": "actions-runner-controller", "watch-namespace": "runners"}
2024-03-05T02:41:27Z    INFO    controller-runtime.builder      Registering a mutating webhook  {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/mutate-actions-summerwind-dev-v1alpha1-runner"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/mutate-actions-summerwind-dev-v1alpha1-runner"}
2024-03-05T02:41:27Z    INFO    controller-runtime.builder      Registering a validating webhook        {"GVK": "actions.summerwind.dev/v1alpha1, Kind=Runner", "path": "/validate-actions-summerwind-dev-v1alpha1-runner"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/validate-actions-summerwind-dev-v1alpha1-runner"}
2024-03-05T02:41:27Z    INFO    controller-runtime.builder      Registering a mutating webhook  {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2024-03-05T02:41:27Z    INFO    controller-runtime.builder      Registering a validating webhook        {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerDeployment", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerdeployment"}
2024-03-05T02:41:27Z    INFO    controller-runtime.builder      Registering a mutating webhook  {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/mutate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2024-03-05T02:41:27Z    INFO    controller-runtime.builder      Registering a validating webhook        {"GVK": "actions.summerwind.dev/v1alpha1, Kind=RunnerReplicaSet", "path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/validate-actions-summerwind-dev-v1alpha1-runnerreplicaset"}
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Registering webhook     {"path": "/mutate-runner-set-pod"}
2024-03-05T02:41:27Z    INFO    starting manager
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook.webhooks     Starting webhook server
2024-03-05T02:41:27Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
2024-03-05T02:41:27Z    INFO    controller-runtime.certwatcher  Updated current TLS certificate
2024-03-05T02:41:27Z    INFO    controller-runtime.webhook      Serving webhook server  {"host": "", "port": 9443}
2024-03-05T02:41:27Z    INFO    controller-runtime.certwatcher  Starting certificate watcher
I0305 02:41:27.158907       1 leaderelection.go:248] attempting to acquire leader lease actions-runner-controller/actions-runner-controller...
I0305 02:41:45.939318       1 leaderelection.go:258] successfully acquired lease actions-runner-controller/actions-runner-controller
2024-03-05T02:41:45Z    DEBUG   events  actions-runner-controller-7f5d54bcd-nthl8_9047ef3a-c087-4b25-b13b-964caf9acecc became leader    {"type": "Normal", "object": {"kind":"Lease","namespace":"actions-runner-controller","name":"actions-runner-controller","uid":"9739d777-102c-486c-8de1-926e7830f0ec","apiVersion":"coordination.k8s.io/v1","resourceVersion":"142448842"}, "reason": "LeaderElection"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner", "source": "kind source: *v1alpha1.Runner"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner", "source": "kind source: *v1.Pod"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment", "source": "kind source: *v1alpha1.RunnerDeployment"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment", "source": "kind source: *v1alpha1.RunnerReplicaSet"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "source": "kind source: *v1alpha1.RunnerReplicaSet"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "source": "kind source: *v1alpha1.Runner"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerpod-controller", "controllerGroup": "", "controllerKind": "Pod", "source": "kind source: *v1.Pod"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runnerpod-controller", "controllerGroup": "", "controllerKind": "Pod"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet", "source": "kind source: *v1alpha1.RunnerSet"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet", "source": "kind source: *v1.StatefulSet"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerpersistentvolume-controller", "controllerGroup": "", "controllerKind": "PersistentVolume", "source": "kind source: *v1.PersistentVolume"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runnerpersistentvolume-controller", "controllerGroup": "", "controllerKind": "PersistentVolume"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "horizontalrunnerautoscaler-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler", "source": "kind source: *v1alpha1.HorizontalRunnerAutoscaler"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "horizontalrunnerautoscaler-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler"}
2024-03-05T02:41:45Z    INFO    Starting EventSource    {"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim", "source": "kind source: *v1.PersistentVolumeClaim"}
2024-03-05T02:41:45Z    INFO    Starting Controller     {"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim"}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runnerpod-controller", "controllerGroup": "", "controllerKind": "Pod", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runner-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "Runner", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runnerreplicaset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerReplicaSet", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runnerpersistentvolume-controller", "controllerGroup": "", "controllerKind": "PersistentVolume", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runnerdeployment-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerDeployment", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runnerset-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "RunnerSet", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "horizontalrunnerautoscaler-controller", "controllerGroup": "actions.summerwind.dev", "controllerKind": "HorizontalRunnerAutoscaler", "worker count": 1}
2024-03-05T02:41:46Z    INFO    Starting workers        {"controller": "runnerpersistentvolumeclaim-controller", "controllerGroup": "", "controllerKind": "PersistentVolumeClaim", "worker count": 1}

Runner Pod Logs

2024-03-02 21:22:27.108  NOTICE --- Runner init started with pid 7
2024-03-02 21:22:27.119  DEBUG --- Github endpoint URL https://github.com/
2024-03-02 21:22:34.356  DEBUG --- Passing --ephemeral to config.sh to enable the ephemeral runner.
2024-03-02 21:22:34.374  DEBUG --- Configuring the runner.

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication

√ Connected to GitHub

# Runner Registration

√ Runner successfully added
√ Runner connection is good

# Runner settings

√ Settings Saved.

2024-03-02 21:22:42.609  DEBUG --- Runner successfully configured.
{
  "agentId": 31,
  "agentName": "runnerdeployment-ldbcx-jndd2",
  "poolId": 1,
  "poolName": "Default",
  "ephemeral": true,
  "serverUrl": "https://pipelinesghubeus3.actions.githubusercontent.com/f3MYkVnNMUNCrBLlsl4DIwcoEuy52XuXf2mOun0Zkmh38bPyMW/",
  "gitHubUrl": "https://github.com/oa-ri",
  "workFolder": "/runner/_work"
2024-03-02 21:22:42.628  DEBUG --- Docker enabled runner detected and Docker daemon wait is enabled
2024-03-02 21:22:42.632  DEBUG --- Waiting until Docker is available or the timeout of 120 seconds is reached
}CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
2024-03-02 21:22:43.98  NOTICE --- WARNING LATEST TAG HAS BEEN DEPRECATED. SEE GITHUB ISSUE FOR DETAILS:
2024-03-02 21:22:43.103  NOTICE --- https://github.com/actions/actions-runner-controller/issues/2056

√ Connected to GitHub

Current runner version: '2.314.1'
2024-03-02 21:22:45Z: Listening for Jobs
2024-03-05 06:36:14Z: Runner connect error: The HTTP request timed out after 00:01:00.. Retrying until reconnected.
2024-03-05 06:37:31Z: Runner reconnected.
2024-03-05 06:50:12Z: Runner connect error: The HTTP request timed out after 00:01:00.. Retrying until reconnected.
2024-03-05 06:51:32Z: Runner reconnected.
2024-03-05 14:13:22Z: Runner connect error: The HTTP request timed out after 00:01:00.. Retrying until reconnected.
2024-03-05 14:14:32Z: Runner reconnected.
github-actions[bot] commented 8 months ago

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

nikola-jokic commented 7 months ago

Hey @iliasnaamane2,

This is not an ARC issue, it is related to the service overall. I'd like to point out that this is also a known issue and part of the roadmap.

iliasnaamane2 commented 7 months ago

Hello @nikola-jokic , thanks for your answer. The issue doesn't occur with Github runners, it occur only with self-hosted-runners. Can you please recheck your answer?

nikola-jokic commented 7 months ago

I am aware of it :relaxed:. You can see we have few issues in the runner repository that are also closed, since they are related to the service. For example: https://github.com/actions/runner/issues/886

iliasnaamane2 commented 7 months ago

@nikola-jokic ,

Thanks for your answer again, I have been using actions-runner-controller for a while now and I remember that I was able to see logs after refreshing the page. This time it's strange I can no longer access the logs and it doesn't make sense. Are you sure this is normal behavior?

Please check screenshot attached. Capture d'écran 2024-04-05 114630

Thanks!

nikola-jokic commented 7 months ago

No problem! This shouldn't have happened, can you please try to run it now? This might have been a temporary error?

iliasnaamane2 commented 7 months ago

@nikola-jokic I am still facing the same issue. Can you please check on your end please? Capture d'écran 2024-04-05 121355 Thanks!

nikola-jokic commented 7 months ago

Can you please download the job log? image

iliasnaamane2 commented 7 months ago

Thanks for your answer. Indeed I can view raw logs attached, but I cannot view them using the dropdown icon as you have in your screenshot. As well after downloading log archive the zip file is empty.

Can you please let me know why I can't view the logs through the job UI?

Thanks in advance for your help!

iliasnaamane2 commented 7 months ago

Capture d'écran 2024-04-05 135301

nikola-jokic commented 7 months ago

Maybe try to clear cache or pause the ad blocker? Seems like the log is streamed, but it isn't displayed on the UI

iliasnaamane2 commented 7 months ago

Same result, actually the logs are streamed and I can see them in the UI when running the worflow but strangely disappeared once refreshing the UI page and the job is completed. As well, I can see logs forever when I am using Github runners instead of self-hosted runners. Below screenshot of logs using Github runners:

image

nikola-jokic commented 7 months ago

Can you please submit it here? It can help route the problem to the correct team :relaxed:

iliasnaamane2 commented 7 months ago

Thanks for your answer. I have created a discussion over here https://github.com/orgs/community/discussions/118795

iliasnaamane2 commented 7 months ago

@nikola-jokic FYI, when I download the log archive the folder is empty with self-hosted-runners and full of logs with all steps when using Github runners. Should I do something on my side to enable log archive with self-hosted runners?

nikola-jokic commented 7 months ago

Let me re-open this one. It seems like something is not streaming the log to the back-end, but I'm not sure why. This is also not gha-runner-scale-set, and I'm not as familiar with the older scaling mode as I am with the new one.

iliasnaamane2 commented 7 months ago

Hi @nikola-jokic , Thanks for your answer, but I think that logs are correctly streamed when the workflow is running but are not correctly saved so I can check them through the UI whenever I want. As well, the archive logs is empty

ghost commented 7 months ago

👀 We're also seeing this on self-hosted runners - it doesn't occur on every workflow, but always affects all of a workflow.

diroussel commented 7 months ago

I did some analysis on my github runners, which are hosted in an AWS VPC, with Network filewall. Looking in the logs, over the previous 12 hours, the following hosts were blocked:

productionresultssa10.blob.core.windows.net productionresultssa11.blob.core.windows.net productionresultssa12.blob.core.windows.net productionresultssa13.blob.core.windows.net productionresultssa14.blob.core.windows.net productionresultssa15.blob.core.windows.net productionresultssa16.blob.core.windows.net productionresultssa17.blob.core.windows.net productionresultssa18.blob.core.windows.net productionresultssa19.blob.core.windows.net

Previously we had allow-listed the following hosts, as described in the docs.

productionresultssa0.blob.core.windows.net productionresultssa1.blob.core.windows.net productionresultssa2.blob.core.windows.net productionresultssa3.blob.core.windows.net productionresultssa4.blob.core.windows.net productionresultssa5.blob.core.windows.net productionresultssa6.blob.core.windows.net productionresultssa7.blob.core.windows.net productionresultssa8.blob.core.windows.net productionresultssa9.blob.core.windows.net

Previously, the docs said to only allow these hosts, but now they say to allow *.blob.core.windows.net which is not great IMHO. Any-bucket controlled by anyone, for any operation in the whole of Azure, worldwide isn't exactly controlled. Surely it's enterprises paying for github enterprise that want selfhosted runners, and don't want to give fettered access to build agents that this exists for?

Anyhow, allowing those hosts, does so far, seem to fix the logs access issue in the web UI to github, and makes the jobs terminate faster.

This is the commit that changed the docs: https://github.com/github/docs/commit/fa66f2f1bce5a0823177be1cc85c7417192d06d3

I think it should probably be changed back to a finite list of specific hosts.

iliasnaamane2 commented 6 months ago

Hello, I can't move on with this issue, any news please? Thanks

marchaarsma commented 6 months ago

I am seeing the same thing happen on a self-hosted linux action-runnen. A similar task on a windows system is working fine. When looking at view-source i notice that the data-log-url="" of all check-step are empty. On the windows system they contain a valid URL .

iliasnaamane2 commented 5 months ago

The issue can be closed as it has been resolved by deploying the same helm chart in another cluster. Do you have an idea what's the root cause of this issue?

jpugh-miax commented 1 month ago

I am experiencing the same exact issue when using runner as a CLI from my locked-down enterprise company network.

No helm install. Just download and run.sh.

The config.sh --check command passes. I'm reaching out to my network people to see if I can get access to logs to see if there were blocked URLs.

Will come back with any findings.

jpugh-miax commented 1 month ago

In my job logs:


[2024-10-11 13:54:28Z INFO JobServerQueue] Catch exception during file upload to results, keep going since the process is best effort.
[2024-10-11 13:54:28Z ERR  JobServerQueue] System.AggregateException: Retry failed after 4 tries. Retry settings can be adjusted in ClientOptions.Retry or by configuring a custom retry policy in ClientOptions.RetryPolicy. (The SSL connection could not be established, see inner exception.) (The SSL connection could not be established, see inner exception.) (The SSL connection could not be established, see inner exception.) (The SSL connection could not be established, see inner exception.)
 ---> Azure.RequestFailedException: The SSL connection could not be established, see inner exception.
 ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
 ---> System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.
 ---> System.Net.Sockets.SocketException (104): Connection reset by peer
   --- End of inner exception stack trace ---