astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
19.95k stars 591 forks source link

Bytecode timed out (60s) #6105

Open nijel opened 4 weeks ago

nijel commented 4 weeks ago

I'm getting "Bytecode timed out (60s)" when running in CI on QEMU emulated arm64. It is expected that this setup is way slower than usual, but I couldn't find a way to change this timeout, as it seems hard-coded:

https://github.com/astral-sh/uv/blob/7551097a170e02093997b1cdaff1dd86fc30c27a/crates/uv-installer/src/compile.rs#L22

Having this configurable, it could be overridden in environments where byte code compilation might take long (it can be a combination of many modules being installed and a slow system).

I'm currently using uv==0.2.36.

The actual log:

#9 405.5 <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
#9 405.5 <jemalloc>: (This is the expected behaviour if you are running under QEMU)
#9 406.8 Resolved 2 packages in 442ms
#9 466.2 Prepared 2 packages in 59.41s
#9 466.2 Installed 2 packages in 5ms
#9 526.4 error: Failed to bytecode-compile Python file in: app/venv/lib/python3.12/site-packages
#9 526.4   Caused by: Bytecode timed out (60s)

Full CI log is here: https://github.com/WeblateOrg/docker/actions/runs/10402861578/job/28808187107

charliermarsh commented 4 weeks ago

Interesting... I think that's a timeout we set per file, so it's intended to catch cases in which Python hangs but doesn't give us any indicator. Do you think it's plausible that a file is taking > 60 seconds under QEMU? We can of course make it configurable.

nijel commented 4 weeks ago

Looking again at the log, it happens when installing cffi only, so there shouldn't be that much files to compile. It there way to log verbosely what is going on there (besides --verbose which I've tried)?

charliermarsh commented 4 weeks ago

I'll have to ask @konstin when they're back from vacation.

zanieb commented 4 weeks ago

There's also RUST_LOG=trace, but I'm not sure you'll get much more helpful information.

nijel commented 4 weeks ago

Hmm, so far the issue didn't happen with verbose, will keep trying. On the other side, sometimes the job is terminated after uv doing something for six hours, most likely during byte compiling as well. Unfortunately it doesn't seem reproducible, it happens sometimes, making it harder to debug.

nijel commented 4 weeks ago

Here is the end of verbose log, which timed out after 6 hours:

#9 423.7 DEBUG Finished building: pycparser==2.22
#9 462.3 DEBUG Finished building: cffi==1.17.0
#9 462.3 Prepared 2 packages in 58.82s
#9 462.3 Installed 2 packages in 6ms
#9 462.3 DEBUG Starting 4 bytecode compilation workers
#9 463.7 DEBUG Bytecode compilation worker exiting: Ok(())
#9 463.8 DEBUG Bytecode compilation worker exiting: Ok(())

No additional output for the rest of nearly 6 hours.

The command line executed here was:

uv pip install \
    --no-cache-dir \
    --compile-bytecode \
    --no-binary :all: \
    cffi==1.17.0

Executed inside docker build on Qemu emulated arm64.

Full CI log here: https://github.com/WeblateOrg/docker/actions/runs/10411635489/job/28835914859

konstin commented 3 weeks ago

I tried but couldn't reproduce this locally, it looks like a non-deterministic failure, and unfortunately i don't have any good idea where we could be happening or how QEMU plays into this.

nijel commented 1 week ago

QEMU just makes everything slow. It might be a race condition somewhere. The behavior ends up to be random - sometimes it just works, sometimes it ends up with Bytecode timed out (60s) and sometimes it hangs until GitHub kills it after few hours. I have enabled debug, but it really doesn't bring any useful info here:

 1360.2 Prepared 199 packages in 14m 37s
1361.3 Installed 204 packages in 1.13s
1361.3 DEBUG Starting 4 bytecode compilation workers
1433.0 DEBUG Bytecode compilation worker exiting: Ok(())
1433.0 DEBUG Bytecode compilation worker exiting: Ok(())
1433.0 DEBUG Bytecode compilation worker exiting: Ok(())
1433.0 DEBUG Released lock at `/app/venv/.lock`
Failed to bytecode-compile Python file in: /app/venv/lib/python3.12/site-packages
1434.5   Caused by: Bytecode timed out (60s)

I've also tried adding RUST_LOG=trace but it doesn't seem to add anything useful to the logs. Can the debug logs be more detailed in the byte compilation so that it is easier to debug where actually the problem lies?

I think the problematic part is starting up the processes. GitHub workers are definitely a shared CPU cores, and slowing this with QEMU can make it easy that the Python takes long to start. So the code quite likely ends up here in some cases:

https://github.com/astral-sh/uv/blob/ccdf2d793bbc2401c891b799772f615a28607e79/crates/uv-installer/src/compile.rs#L308-L310

My rust knowledge is zero, so I don't really understand how this situation is handled in the rest of the code.

Anyway, I've written https://github.com/astral-sh/uv/pull/6958 to separate timeout exceptions so that it is clear whether the issue is in Python startup or in byte-compiling.