elixir-nx / xla

Pre-compiled XLA extension
Apache License 2.0
87 stars 20 forks source link

Build XLA extension as tarball #1

Closed seanmor5 closed 3 years ago

seanmor5 commented 3 years ago

Builds a tarball with XLA dependencies including all required headers and a single shared-object which can be linked in to other executables to make use of TensorFlow XLA without Bazel, Python, NumPy, and the rest of the TensorFlow build stuff.

As is this works, but I had to a do some troubleshooting manually that should be automated:

  1. xla_data.pb.h cannot find port_def.inc. port_def.inc is apparently a file from Google protobuf. I fixed this by finding the location of my TensorFlow installation and then adding -I$(TF_INCLUDE_DIR). We need to find a way to ensure that's packaged into the tarball without depending on a global TF install

  2. grpcpp/grpcpp.h not found in pjrt/distributed/channel.h. GRPCPP is included in an include dir which somehow gets mapped inside the top-level include dir, this can be fixed by moving GRPCPP from include/include/grpcpp to include/grpcpp. We can automate this with some bazel magic I think.

  3. llvm/target/TargetMachine.h not found in xla/service/cpu. An llvm folder exists in the top-level include, but XLA is actually looking for the contents of llvm/include/llvm - so moving llvm/include/llvm to top-level include directory fixes this. Again this is a problem with how Bazel is mapping some of these paths/includes

  4. xla/client/lib/lu_decomposition.h not found. For some reason this header gets lost? I guess we can copy it in somehow - that's what I did.

  5. The current rules package A LOT of transitive headers and dependencies. Honestly not sure which ones are necessary. I manually deleted like 80% of the folders generated inside include and everything worked just fine, but I'm not sure if that's going to base the case in every circumstance.

josevalim commented 3 years ago

Awesome job @seanmor5!

A couple notes:

  1. We should probably document those steps somewhere

  2. About 5, should we explicit list which headers we want to keep?

seanmor5 commented 3 years ago

@josevalim Steps 1-4 can be resolved by remapping paths into the top-level include generated in the tarball, I am doing it now and it should be fixed soon.

For 5, the problem is I don't know what headers are actually required - and some of them may end up being target and platform dependent.

josevalim commented 3 years ago

@seanmor5 I see. How many MBs of headers are we talking about here?

seanmor5 commented 3 years ago

Compressed the total size of the tarball is 90MB. Uncompressed the include dir is 178MB and the .so is 286MB

seanmor5 commented 3 years ago

I've fixed all the above issues and was able to pass all of the test/exla/defn_expr_test.exs tests. I had to get rid of references to AOT compilation as that is not included in this setup and will not work with this approach. There is a test which crashes on a shape check that I am looking into. Both of those issues are directly related to EXLA though and not necessarily tied into the extension

jonatanklosko commented 3 years ago

Amazing! The build succeeded for me locally and also in a CI setup both on Ubuntu and macOS. Worth noting that the CI finished notably faster than building EXLA (example).

FTR initially I run into errors regarding python version locally and resolved this with export PYTHON_BIN_PATH=/usr/bin/python3.9.

jonatanklosko commented 3 years ago

Merging this as a consequence of #2, we can also revisit the Tensorflow versioning later.