actions / setup-python

Set up your GitHub Actions workflow with a specific version of Python
MIT License
1.59k stars 507 forks source link

Download of a version hangs and breaks after timeout #806

Closed orim-orca closed 3 months ago

orim-orca commented 5 months ago

Description: I have a task downloading python once an hour on a custom github runner running on k8s. At least once a day the download operation hangs for the timeout (which is currently is set to 6 hours by default) and then gets automatically canceled

Action version: v6

Platform:

Runner type:

Tools version: Python3.11

Repro steps:
Have a running k8s hosted (container) runner without version 3.11. Run the step asking it to download python3.11

Expected behavior: This should work all the time.

Actual behavior: Between 3-5% of the time the download hangs.

aparnajyothi-y commented 5 months ago

Hello @orim-orca

Thank you for creating the issue and we will get back to you once we have some feedback on the issue.

priya-kinthali commented 4 months ago

Hello @orim-orca 👋,

Thankyou for reporting the issue. Could you please provide more specifics such as workflow, any error messages that might be appearing or you can enable debugging and share the logs when the cancellation occurs. it's challenging to determine the exact cause of the issue. Any additional details you could provide would be very helpful :)

Looking forward to your response. Thanks!

priya-kinthali commented 4 months ago

Hello @orim-orca 👋, just a gentle reminder regarding this issue. If you have any updates on the information requested, could you please let us know :)

Collin3 commented 4 months ago

I'm also seeing this on the arc scale set runners hosted in k8s. Whether or not I have debug logging enabled these are the only logs I'm able to see which don't seem particularly useful to me. Every time it hangs we see these logs and then our runner pod gets killed for reasons that are unclear (nowhere near memory limits or anything like that).

Run actions/setup-python@v5
Installed versions
  Version 3.8.17 was not found in the local cache
  Version 3.8.17 is available for downloading
  Download from "https://github.com/actions/python-versions/releases/download/3.8.17-5199874912/python-3.8.17-linux-22.04-x64.tar.gz"
  Extract downloaded archive
  /usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /home/runner/_work/_temp/067f94b3-f535-40af-93b9-baefc87d3942 -f /home/runner/_work/_temp/bec07b56-91cb-48be-8dc3-2776a2ac4952

Here's how I'm calling the action. This is one of the first steps run after checking out the repo, so the job runs for a few seconds, then the we get into this step, we see the output pasted above, our runner pod gets killed and then our job is left hanging.

- uses: actions/setup-python@v5
  with:
    python-version: '3.8.17'
    cache: 'pip'
    cache-dependency-path: 'requirements*.txt'

It is failing roughly 50-60% of the time for us. We can cancel the workflow, else it will hang for a 15 minutes self imposted timeout. When it succeeds, the install python step finishes in less than 30 seconds and our entire workflow is done in ~1.5 minutes..

edit: I have also confirmed that I can reproduce the error with and without any caching inputs. So just a basic install python with specifying a python version can reproduce the hang on the arc scale set k8s runners 🤔

priya-kinthali commented 4 months ago

Hello, The problem appears to be specific to Kubernetes runners, as it's not reproduced on other self-hosted runners. It seems like the issue is not related to setup-python action.
 However please find the below possible causes:


Collin3 commented 4 months ago

It seems to only be reproducible when the runner pod gets scheduled onto large k8s nodes with a lot of CPU. This leads me to believe that it might be a concurrency issue where setup python and specifically the un-tarring step thinks it has access to more CPU than it really does. Is there an environment variable of some sort that I can specify in order to pass a flag to that tar command to only run with X threads where X is the number of cpu cores my pod has access to?

priya-kinthali commented 4 months ago

Hello, Thank you for sharing your observations! As of now, the setup-python action does not currently support specifying the number of threads for the un-tarring step via an environment variable.

skoonin commented 3 months ago

While I don't have much to add but I am seeing this issue as well on self-hosted arc scale set runners. For now I am going to revert to pre-installing python on the runners.

priya-kinthali commented 3 months ago

Hello 👋, After investigating, it appears the issue is not directly related to the setup-python action, but rather specific to certain runner environments, we'll close this issue for now.
Please feel free to reach us out or create new issue if you encounter any other problems. We appreciate your understanding!!

skoonin commented 3 months ago

Even though this is closed, I forgot to add that I also use the setup-node action, and that has never had any issues.

Collin3 commented 3 months ago

Yeah I would definitely take the stance that the action could be improved to work better for self hosted runners..for example I can install other dependencies, even python, via the asdf action since they provide an ASDF_CONCURRENCY variable that allows you as a client to specify the concurrency that is used to install the dependencies. It would be really nice if there was a similar configuration option here when the default setup doesn't work well.