elastic / ml-cpp

Machine learning C++ code
Other
149 stars 62 forks source link

[ML] Upgrade to PyTorch 2.3.1 #2688

Closed edsavage closed 1 month ago

edsavage commented 1 month ago

Update Docker images and dependency files with PyTorch 2.3.1.

Testing on Linux x86_64 gives promising indications that this version of PyTorch may resolve some memory allocation issues related to the pytorch_inference process.

Opening this PR to better test it across the range of platforms and architectures that we support.

edsavage commented 1 month ago

Compilation errors for the macOS intel cross compilation build are due to a (now) too old compiler version

In file included from /buildkite/builds/bk-agent-prod-k8s-1720411844430980726/elastic/ml-cpp-pr-builds/bin/pytorch_inference/CResultWriter.cc:12:
In file included from /buildkite/builds/bk-agent-prod-k8s-1720411844430980726/elastic/ml-cpp-pr-builds/bin/pytorch_inference/CResultWriter.h:21:
In file included from /usr/local/sysroot-x86_64-apple-macosx10.14/usr/local/include/pytorch/torch/csrc/api/include/torch/types.h:3:
In file included from /usr/local/sysroot-x86_64-apple-macosx10.14/usr/local/include/pytorch/ATen/ATen.h:7:
In file included from /usr/local/sysroot-x86_64-apple-macosx10.14/usr/local/include/pytorch/ATen/Context.h:21:
In file included from /usr/local/sysroot-x86_64-apple-macosx10.14/usr/local/include/pytorch/c10/util/CallOnce.h:8:
/usr/local/sysroot-x86_64-apple-macosx10.14/usr/local/include/pytorch/c10/util/C++17.h:18:2: error: "You're trying to build PyTorch with a too old version of Clang. We need Clang 9 or later."

and for the macOS aarch64 build there's a problem with tar and some of the flags in the dependency archive.