google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.25k stars 147 forks source link

Error with putting arrays on CPU in cloud TPUs #101

Open philippe-eecs opened 5 months ago

philippe-eecs commented 5 months ago

Hi I've been setting up big vision on a v4-32 TPU pod and I run into this error whenever I call u.put_cpu

jaxlib.xla_extension.XlaRuntimeError: INVALID_ARGUMENT: Cannot copy array to non-addressable device TFRT_CPU_0

I'm guessing the CPUs on the TPU pod aren't configured properly? Is there a way around this or a way to fix this issue?

Totally new to TPUs and let me know if you need more information.

akolesnikoff commented 4 months ago

Sorry for the late reply.

How old is the big_vision repo you were using? We had a bug that could explain what you see, but it was fixed around 5 months ago in this commit: https://github.com/google-research/big_vision/commit/7ace659452dee4b68547575352c022a2eef587a5#diff-bbdd9ea1455413f6aebc74dfed68d82d42c74acc63e722e09fa9015c908b9150.

If you used the code past that version, I recommend getting a minimal reproduction and perhaps asking in the main Jax GitHub.