apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.33k stars 3.48k forks source link

[Java][CI] Building C++ libraries on Ubuntu aarch64 job takes 3+ hours to complete in java-jars #43434

Open danepitkin opened 2 months ago

danepitkin commented 2 months ago

Describe the enhancement requested

Can we speed this up? Other OS/architectures take 10-30min.

See java-jars job in crossbow: https://github.com/ursacomputing/crossbow/actions/runs/9817940199/job/27109855273

Component(s)

Continuous Integration

assignUser commented 2 months ago

It's the vcpkg builds. If you look at the time stamps arrow itself only takes 30 minutes. But llvm takes over an hour https://github.com/ursacomputing/crossbow/actions/runs/9817940199/job/27109855273#step:8:5926 and abseil half an hour etc.

I'd recommend adding vcpkg binary caching through the nuget backend like I did for other jobs: https://github.com/apache/arrow/pull/13507

Docker complicates things a bit here, the easiest way is probably to mount the vcpkg directory so you don't have pass things into docker etc.?

danepitkin commented 2 months ago

Thank you! I'll take a look at this.

pitrou commented 1 month ago

You're building in release mode, and also building the C++ tests, which probably explains part of the build duration.

Also, in the JNI build you've linked to, the Arrow C++ build takes 25 minutes (from 12:05 to 12:30).

danepitkin commented 2 weeks ago

I noticed the java-jars job is back down to ~30min. I wonder if someone fixed this already? See example here: https://github.com/apache/arrow/pull/44023

pitrou commented 2 weeks ago

@danepitkin It's just that in some cases the vcpkg step is successfully cached in the Docker image, and in some cases it's not.

There is an issue open to improve this: https://github.com/apache/arrow/issues/43951