actions / setup-python

Set up your GitHub Actions workflow with a specific version of Python
MIT License
1.59k stars 505 forks source link

Intermittent failures during Post Setup Python step for MacOS #857

Open andrewkho opened 1 month ago

andrewkho commented 1 month ago

I'm new to Github Actions and I'm having trouble understanding this failure, apologies if this isn't the right way to flag the issue.

Description: Post Setup Python fails intermittently with macos-latest. On successful runs it's much slower to clean up / shut down than windows and linux.

Action version: Tested with Actions v3/v4 and setup-python v4/v5

Platform:

Runner type:

Tools version: 3.8, 3.9, 3.10

Repro steps:

The original workflow yaml is here: https://github.com/pytorch/data/blob/main/.github/workflows/stateful_dataloader_ci.yml

In this failed run I tried updating actions from v3 -> v4 and setup-python from v4 -> v5, and it still exhibits the behaviour: Example of failed run: https://github.com/pytorch/data/actions/runs/8903946672/job/24452473208?pr=1249 Failed retry with debug logs: https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388

##[debug]Evaluating condition for step: 'Post Setup Python 3.9'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Setup Python [3](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:3).9
##[debug]Loading inputs
##[debug]Evaluating: matrix.python-version
##[debug]Evaluating Index:
##[debug]..Evaluating matrix:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'python-version'
##[debug]=> 3.[9](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:9)
##[debug]Result: 3.9
##[debug]Evaluating: (((github.server_url == 'https://github.com') && github.token) || '')
##[debug]Evaluating Or:
##[debug]..Evaluating And:
##[debug]....Evaluating Equal:
##[debug]......Evaluating Index:
##[debug]........Evaluating github:
##[debug]........=> Object
##[debug]........Evaluating String:
##[debug]........=> 'server_url'
##[debug]......=> 'https://github.com/'
##[debug]......Evaluating String:
##[debug]......=> 'https://github.com'
##[debug]....=> true
##[debug]....Evaluating Index:
##[debug]......Evaluating github:
##[debug]......=> Object
##[debug]......Evaluating String:
##[debug]......=> 'token'
##[debug]....=> '***'
##[debug]..=> '***'
##[debug]=> '***'
##[debug]Expanded: ((('https://github.com/' == 'https://github.com') && '***') || '')
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
##[debug]Re-evaluate condition on job cancellation for step: 'Post Setup Python 3.9'.

Expected behavior: Expect Post Setup-Python to finish quickly and succeed.

Actual behavior: Post Setup-Python hangs and marks the run as failed.

HarithaVattikuti commented 1 month ago

Hello @andrewkho Thank you for creating this issue. We will investigate it and get back to you as soon as we have some feedback.

aparnajyothi-y commented 1 month ago

Hello @andrewkho, we have investigated the issue and we are not able to reproduce it with actions/setup-python@v3,v4,v5. Please find the screenshots for reference. We have noticed in the provided run in this issue that post checkout job isn't terminating as expected. It might be due to an external service not responding as expected, causing the job to hang. Moreover, the workflow provided does interact with a few external services: 1.PyTorch Channels: The step ""Get PyTorch Channel"" determines the URL for either the test or nightly PyTorch builds hosted on ""https://download.pytorch.org/"". This URL is later used in the ""Install dependencies"" step to install PyTorch. 2.GitHub: The step ""Check out source repository"" uses the actions/checkout@v4 GitHub Action to fetch the source code of the repository. 3.PyPI (Python Package Index): Several steps in the workflow involve installing Python packages using pip, which fetches packages from PyPI. Any of these could potentially cause a hang if the service is down, or there's an issue with the package/tool being fetched.

image image image image

Please let us know in case of any further clarifications needed.

andrewkho commented 1 month ago

Hi @aparnajyothi-y thanks for trying to repro. I think the issue is that there is no clear error message or way to debug this as far as I can tell. eg I have no idea what the container is doing, if the failure is eg. due to a timeout, if it's a timeout, how long is it? Or is it an OOM? It's really difficult to try and debug without anything to go on

aparnajyothi-y commented 1 month ago

Hello @andrewkho, to help investigate the error message, could you please enable debug logs and run the workflow? You can follow the steps in this document to do so. Once done, kindly update the link to the repository with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.

aparnajyothi-y commented 1 week ago

Hello @andrewkho, Could you share the link of the workflow run with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.