apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.44k stars 3.52k forks source link

[CI][Packaging][Release] Jobs that run on ARM self-hosted runners are flaky and failing with communication lost #44418

Open raulcd opened 4 days ago

raulcd commented 4 days ago

Describe the bug, including details regarding any error messages, version, and platform.

The k8s self-hosted runners solution is slightly flaky lately. See for example:

The error:

The self-hosted runner: k8s-runners-linux-arm-8g6tn-gpmc7 lost communication with the server.

I am seeing this happening on the maintenance branch for the release too.

Component(s)

Continuous Integration, Packaging, Release

raulcd commented 4 days ago

cc @assignUser

assignUser commented 4 days ago

Will investigate

assignUser commented 4 days ago

This type of error usually happens when the runner pod gets oom or cpu killed, did we increase the feature set that's build or something like that, that might increase memory or cpu use?

kou commented 3 days ago

https://github.com/apache/arrow/pull/44348 may be related. It enables Azure file system.

kou commented 3 days ago

Can we increase assigned resources for the runner?

assignUser commented 2 days ago

Ah yeah that could do it, I'll see what I can do.