elixir-nx / nx

Multi-dimensional arrays (tensors) and numerical definitions for Elixir
2.64k stars 191 forks source link

Mix cannot continue when building EXLA #845

Closed jnnks closed 2 years ago

jnnks commented 2 years ago

The EXLA build fails with:

Unchecked dependencies for environment prod:
* xla (Hex package)
  could not find an app file at "_build/prod/lib/xla/ebin/xla.app". This may happen if the dependency was not yet compiled or the dependency indeed has no app file (then you can pass app: false as option)
** (Mix) Can't continue due to errors on dependencies
Full Log ``` $ XLA_BUILD=true MIX_ENV=prod mix compile ==> xla Compiling 2 files (.ex) Generated xla app rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \ bazel build --define "framework_shared_object=false" -c opt //tensorflow/compiler/xla/extension:xla_extension && \ mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \ cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz Extracting Bazel installation... Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: 'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS Loading: Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/c3e082762b7664bbc7ffd2c39e86464928e27c0c.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured) DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400" DEBUG: Repository io_bazel_rules_docker instantiated at: /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/WORKSPACE:23:14: in /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/workspace0.bzl:108:34: in workspace /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories Repository rule git_repository defined at: /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (16 packages loaded, 14 targets configured) Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (187 packages loaded, 16013 targets configured) INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (188 packages loaded, 16972 targets configured). INFO: Found 1 target... [0 / 10] [Prepa] BazelWorkspaceStatusAction stable-status.txt ... (4 actions, 0 running) [128 / 234] Compiling src/google/protobuf/compiler/objectivec/objectivec_field.cc; 1s local ... (16 actions, 15 running) [327 / 518] Compiling llvm/lib/TableGen/Record.cpp; 2s local ... (16 actions, 15 running) [574 / 1,340] Compiling llvm/lib/Support/CommandLine.cpp; 1s local ... (16 actions, 15 running) [1,151 / 1,481] Compiling llvm/utils/TableGen/InstrInfoEmitter.cpp; 3s local ... (16 actions, 15 running) [1,911 / 7,107] Compiling mlir/lib/IR/Dominance.cpp; 6s local ... (16 actions, 15 running) [2,247 / 7,107] Compiling llvm/lib/CodeGen/MachineSink.cpp; 7s local ... (16 actions, 15 running) [2,473 / 7,107] Compiling llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp; 4s local ... (16 actions, 15 running) [2,646 / 7,107] Compiling llvm/lib/Transforms/Coroutines/CoroFrame.cpp; 8s local ... (16 actions, 15 running) [2,787 / 7,107] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 6s local ... (16 actions, 15 running) [2,931 / 7,107] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 45s local ... (16 actions, 15 running) [3,090 / 7,107] Compiling mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp; 17s local ... (16 actions, 15 running) [3,224 / 7,107] Compiling src/cpu/x64/jit_uni_dw_convolution.cpp; 8s local ... (16 actions, 15 running) [3,404 / 7,107] Compiling src/cpu/rnn/ref_rnn.cpp; 36s local ... (16 actions running) [3,808 / 7,107] Compiling llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp; 14s local ... (16 actions running) [4,132 / 7,107] Compiling tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/transforms/legalize_to_linalg.cc; 10s local ... (16 actions running) [4,580 / 7,107] Compiling src/cpu/cpu_convolution_list.cpp; 7s local ... (16 actions, 15 running) [5,414 / 7,107] Compiling tensorflow/compiler/xla/service/cpu/runtime_single_threaded_matmul.cc; 24s local ... (16 actions, 15 running) [6,024 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc; 65s local ... (16 actions running) [6,298 / 7,107] Compiling tensorflow/core/util/batch_util.cc; 62s local ... (16 actions, 15 running) [6,603 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc; 208s local ... (16 actions, 15 running) [6,821 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/mark_ops_for_outside_compilation.cc; 84s local ... (16 actions, 15 running) [6,963 / 7,107] Compiling tensorflow/core/kernels/resource_variable_ops.cc; 133s local ... (16 actions, 15 running) Target //tensorflow/compiler/xla/extension:xla_extension up-to-date: bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz INFO: Elapsed time: 1779.200s, Critical Path: 274.62s INFO: 7107 processes: 574 internal, 6533 local. INFO: Build completed successfully, 7107 total actions INFO: Build completed successfully, 7107 total actions ==> complex Compiling 2 files (.ex) Generated complex app ==> nx Compiling 24 files (.ex) Generated nx app ==> exla Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib' Compiling 21 files (.ex) Generated exla app ==> exla_compile_test Unchecked dependencies for environment prod: * xla (Hex package) could not find an app file at "_build/prod/lib/xla/ebin/xla.app". This may happen if the dependency was not yet compiled or the dependency indeed has no app file (then you can pass app: false as option) ** (Mix) Can't continue due to errors on dependencies ```

Happens with {:exla, "~> 0.2"} on a new project. The compilation seems to work fine though. XLA service is initialized and StreamExecutor can find a device.

No error is raised for subsequent compiles.

josevalim commented 2 years ago

This is weird because it even says at the beginning that the app was compiled defined. What happens if you do mix deps.compile xla? What is in "_build/prod/lib/xla"?

jnnks commented 2 years ago

What happens if you do mix deps.compile xla?

nothing, no output


What is in "_build/prod/lib/xla"?

see below

jnnks commented 2 years ago

What is in "_build/prod/lib/xla"?

Nothing after the first compilation. Only after the second time, contents appear, including _build/prod/lib/xla/ebin/xla.app:

iex -S mix ``` $ XLA_BUILD=true MIX_ENV=prod iex -S mix Erlang/OTP 24 [erts-12.3.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit] ==> xla Compiling 2 files (.ex) Generated xla app make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date. ==> exla_compile_test Compiling 1 file (.ex) Generated exla_compile_test app Interactive Elixir (1.13.4) - press Ctrl+C to exit (type h() ENTER for help) iex(1)> ExlaCompileTest.hello 13:11:07.188 [info] XLA service 0x7f6a4c0394e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 13:11:07.201 [info] StreamExecutor device (0): Host, Default Version #Nx.Tensor< s64 EXLA.Backend 3 > ```

similar situation with mix deps.compile xla. no error, _build/prod/lib/xla/ebin/xla.app exists afterwards

josevalim commented 2 years ago

@jnnks can you please try this:

rm -rf _build
rm -rf deps
mix deps.get
XLA_BUILD=true  MIX_ENV=prod mix deps.compile xla
tree _build/prod/lib/xla
XLA_BUILD=true  MIX_ENV=prod mix deps.compile exla
tree _build/prod/lib/xla

I am suspecting exla compilation is the one erasing it somehow.

jnnks commented 2 years ago

for some reason the first mix deps.compile xla does not complete, but the second does. (Mix 1.13.4)

logs ``` $ rm -rf _build $ rm -rf deps $ mix deps.get Resolving Hex dependencies... Dependency resolution completed: Unchanged: complex 0.4.1 elixir_make 0.6.3 exla 0.2.3 nx 0.2.1 xla 0.3.0 * Getting exla (Hex package) * Getting elixir_make (Hex package) * Getting nx (Hex package) * Getting xla (Hex package) * Getting complex (Hex package) $ XLA_BUILD=true MIX_ENV=prod mix deps.compile xla ==> xla Compiling 2 files (.ex) Generated xla app ==> elixir_make Compiling 1 file (.ex) Generated elixir_make app ==> xla Unchecked dependencies for environment prod: * elixir_make (Hex package) the dependency build is outdated, please run "MIX_ENV=prod mix deps.compile" could not compile dependency :xla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile xla", update it with "mix deps.update xla" or clean it with "mix deps.clean xla" ==> exla_compile_test ** (Mix) Can't continue due to errors on dependencies $ tree _build/prod/lib/xla _build/prod/lib/xla └── ebin ├── Elixir.Mix.Tasks.Xla.Info.beam ├── Elixir.XLA.beam └── xla.app 1 directory, 3 files $ XLA_BUILD=true MIX_ENV=prod mix deps.compile xla ==> xla Compiling 2 files (.ex) Generated xla app rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \ bazel build --define "framework_shared_object=false" -c opt //tensorflow/compiler/xla/extension:xla_extension && \ mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \ cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: 'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS Loading: Loading: 0 packages loaded Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured) INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 6 targets configured). INFO: Found 1 target... [0 / 3] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //tensorflow/compiler/xla/extension:xla_extension up-to-date: bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz INFO: Elapsed time: 0.333s, Critical Path: 0.02s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action $ tree _build/prod/lib/xla _build/prod/lib/xla └── ebin ├── Elixir.Mix.Tasks.Xla.Info.beam ├── Elixir.XLA.beam └── xla.app 1 directory, 3 files $ XLA_BUILD=true MIX_ENV=prod mix deps.compile exla ==> xla make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date. ==> exla Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib' Compiling 21 files (.ex) warning: @behaviour Nx.Defn.Compiler does not exist (in module EXLA) lib/exla.ex:1: EXLA (module) warning: got "@impl true" for function __jit__/5 but no behaviour specifies such callback. There are no known callbacks, please specify the proper @behaviour and make sure it defines callbacks lib/exla.ex:369: EXLA (module) warning: got "@impl true" for function __stream__/7 but no behaviour specifies such callback. There are no known callbacks, please specify the proper @behaviour and make sure it defines callbacks lib/exla.ex:372: EXLA (module) == Compilation error in file lib/exla/defn/stream.ex == ** (ArgumentError) could not load module Nx.Stream due to reason :unavailable (elixir 1.13.4) lib/protocol.ex:315: Protocol.assert_protocol!/2 lib/exla/defn/stream.ex:58: (module) could not compile dependency :exla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile exla", update it with "mix deps.update exla" or clean it with "mix deps.clean exla" $ tree _build/prod/lib/xla _build/prod/lib/xla └── ebin ├── Elixir.Mix.Tasks.Xla.Info.beam ├── Elixir.XLA.beam └── xla.app 1 directory, 3 files ```
josevalim commented 2 years ago

Ok, I missed some deps, sorry! it should have been this instead:

rm -rf _build
rm -rf deps
mix deps.get
XLA_BUILD=true  MIX_ENV=prod mix deps.compile elixir_make xla
tree _build/prod/lib/xla
XLA_BUILD=true  MIX_ENV=prod mix deps.compile complex nx exla
tree _build/prod/lib/xla

maybe complex is not required… but I think XLA will be there on both runs.

jnnks commented 2 years ago

_build/prod/lib/xla/ebin/xla.app is present both times

more logs ``` $ rm -rf _build $ rm -rf deps $ mix deps.get Resolving Hex dependencies... Dependency resolution completed: Unchanged: complex 0.4.1 elixir_make 0.6.3 exla 0.2.3 nx 0.2.1 xla 0.3.0 * Getting exla (Hex package) * Getting elixir_make (Hex package) * Getting nx (Hex package) * Getting xla (Hex package) * Getting complex (Hex package) $ XLA_BUILD=true MIX_ENV=prod mix deps.compile elixir_make xla ==> elixir_make Compiling 1 file (.ex) Generated elixir_make app ==> xla Compiling 2 files (.ex) Generated xla app rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \ bazel build --define "framework_shared_object=false" -c opt //tensorflow/compiler/xla/extension:xla_extension && \ mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \ cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: 'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS Loading: Loading: 0 packages loaded Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured) INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 6 targets configured). INFO: Found 1 target... [0 / 3] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //tensorflow/compiler/xla/extension:xla_extension up-to-date: bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz INFO: Elapsed time: 0.300s, Critical Path: 0.02s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action $ tree _build/prod/lib/xla _build/prod/lib/xla └── ebin ├── Elixir.Mix.Tasks.Xla.Info.beam ├── Elixir.XLA.beam └── xla.app 1 directory, 3 files $ XLA_BUILD=true MIX_ENV=prod mix deps.compile complex nx exla ==> complex Compiling 2 files (.ex) Generated complex app ==> nx Compiling 24 files (.ex) Generated nx app ==> xla make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date. ==> exla Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib' Compiling 21 files (.ex) Generated exla app $ tree _build/prod/lib/xla _build/prod/lib/xla └── ebin ├── Elixir.Mix.Tasks.Xla.Info.beam ├── Elixir.XLA.beam └── xla.app 1 directory, 3 files ```
josevalim commented 2 years ago

So when does it disappear?!?! Only on “mix compile”?

jnnks commented 2 years ago

Seems like the problem only appears when building XLA from scratch. All the other times a cached archive has been used. Could that play a role?

josevalim commented 2 years ago

Sounds like it but i was hoping the instructions above could reproduce it. If you finally do a mix compile at the end of the last instructions, it is that when xla.app finally disappears?

jnnks commented 2 years ago

Nope, still there :) I'll let the full build run later with a directory watcher to see if the file ever existed

josevalim commented 2 years ago

Schrodinger's xla.app. 😄

Thank you for digging deeper!

jnnks commented 2 years ago

Looks like it was in fact deleted during the build process.

First Run (fails)

XLA_BUILD=true MIX_ENV=prod mix compile ``` $ XLA_BUILD=true MIX_ENV=prod mix compile ==> elixir_make Compiling 1 file (.ex) Generated elixir_make app ==> xla Compiling 2 files (.ex) Generated xla app mkdir -p /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \ cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \ git init && \ git remote add origin https://github.com/tensorflow/tensorflow.git && \ git fetch --depth 1 origin 3f878cff5b698b82eea85db2b60d65a2e320850e && \ git checkout FETCH_HEAD && \ rm /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelversion hint: Using 'master' as the name for the initial branch. This default branch name hint: is subject to change. To configure the initial branch name to use in all hint: of your new repositories, which will suppress this warning, call: hint: hint: git config --global init.defaultBranch hint: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and hint: 'development'. The just-created branch can be renamed via this command: hint: hint: git branch -m Initialized empty Git repository in /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.git/ From https://github.com/tensorflow/tensorflow * branch 3f878cff5b698b82eea85db2b60d65a2e320850e -> FETCH_HEAD Note: switching to 'FETCH_HEAD'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at 3f878cff Merge pull request #54226 from tensorflow-jenkins/version-numbers-2.8.0-22199 rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \ cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \ bazel build --define "framework_shared_object=false" -c opt //tensorflow/compiler/xla/extension:xla_extension && \ mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \ cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz Extracting Bazel installation... Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: 'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS Loading: Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/c3e082762b7664bbc7ffd2c39e86464928e27c0c.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured) DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400" DEBUG: Repository io_bazel_rules_docker instantiated at: /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/WORKSPACE:23:14: in /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/workspace0.bzl:108:34: in workspace /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories Repository rule git_repository defined at: /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (146 packages loaded, 4023 targets configured) INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (188 packages loaded, 16972 targets configured). INFO: Found 1 target... [0 / 11] [Prepa] Writing file tensorflow/compiler/xla/extension/xla_extension.args [74 / 219] Compiling src/google/protobuf/compiler/java/java_message.cc; 4s local ... (8 actions running) [134 / 219] Compiling src/google/protobuf/descriptor.cc; 13s local ... (8 actions running) [268 / 670] Compiling mlir/tools/mlir-tblgen/AttrOrTypeDefGen.cpp; 5s local ... (8 actions running) [383 / 670] Compiling llvm/lib/Support/ItaniumManglingCanonicalizer.cpp; 7s local ... (8 actions running) [532 / 999] Compiling llvm/lib/Support/SourceMgr.cpp; 2s local ... (8 actions running) [749 / 999] Compiling mlir/lib/IR/MLIRContext.cpp; 7s local ... (8 actions running) [1,084 / 1,366] Compiling llvm/lib/Support/Signals.cpp; 1s local ... (8 actions running) [1,240 / 1,488] Compiling llvm/utils/TableGen/GlobalISelEmitter.cpp; 28s local ... (8 actions running) [2,217 / 7,107] Compiling tensorflow/core/util/test_log.pb.cc; 7s local ... (8 actions running) [2,365 / 7,107] Compiling tensorflow/core/framework/variant_op_registry.cc; 8s local ... (8 actions running) [2,438 / 7,107] Compiling tensorflow/core/util/batch_util.cc; 55s local ... (8 actions running) [2,578 / 7,107] Compiling tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc; 22s local ... (8 actions, 7 running) [2,906 / 7,107] Compiling tensorflow/compiler/mlir/xla/transforms/xla_legalize_tf.cc; 37s local ... (7 actions running) [3,137 / 7,107] Compiling tensorflow/compiler/mlir/xla/transforms/legalize_tf.cc; 81s local ... (8 actions running) [3,258 / 7,107] Compiling tensorflow/compiler/mlir/xla/transforms/legalize_tf.cc; 218s local ... (8 actions running) [3,414 / 7,107] Compiling tensorflow/compiler/tf2xla/kernels/categorical_op.cc; 20s local ... (8 actions running) [3,519 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/tpu_dynamic_layout_pass.cc; 152s local ... (8 actions running) [3,632 / 7,107] Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp; 113s local ... (8 actions running) [3,786 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/gpu_fusion.cc; 22s local ... (8 actions running) [3,903 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/convert_launch_func_to_tf_call.cc; 36s local ... (8 actions running) [4,103 / 7,107] Compiling tensorflow/core/kernels/transpose_functor_cpu.cc; 31s local ... (8 actions running) [4,306 / 7,107] Compiling tensorflow/compiler/xla/service/cpu/runtime_matmul.cc; 42s local ... (8 actions running) [4,469 / 7,107] Compiling tensorflow/compiler/xla/service/cpu/runtime_matmul.cc; 462s local ... (8 actions running) [4,685 / 7,107] Compiling tensorflow/core/kernels/resource_variable_ops.cc; 157s local ... (8 actions, 7 running) [5,185 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc; 213s local ... (8 actions running) [5,845 / 7,107] Compiling src/cpu/rnn/ref_rnn.cpp; 79s local ... (8 actions running) [6,502 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc; 486s local ... (8 actions running) Target //tensorflow/compiler/xla/extension:xla_extension up-to-date: bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz INFO: Elapsed time: 6320.181s, Critical Path: 784.24s INFO: 7107 processes: 574 internal, 6533 local. INFO: Build completed successfully, 7107 total actions INFO: Build completed successfully, 7107 total actions ==> complex Compiling 2 files (.ex) Generated complex app ==> nx Compiling 24 files (.ex) Compiling lib/nx/binary_backend.ex (it's taking more than 10s) Generated nx app ==> exla Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.2.2/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib' Compiling 21 files (.ex) Generated exla app ==> exla_compile_test Unchecked dependencies for environment prod: * xla (Hex package) could not find an app file at "_build/prod/lib/xla/ebin/xla.app". This may happen if the dependency was not yet compiled or the dependency indeed has no app file (then you can pass app: false as option) ** (Mix) Can't continue due to errors on dependencies ```
inotifywait -m -r . ```Setting up watches. Beware: since -r was given, this may take a while! Watches established. ... ./_build/prod/lib/xla/ ACCESS,ISDIR ebin ./_build/prod/lib/xla/ebin/ ACCESS,ISDIR ./_build/prod/lib/xla/ CLOSE_NOWRITE,CLOSE,ISDIR ebin ./_build/prod/lib/xla/ebin/ CLOSE_NOWRITE,CLOSE,ISDIR ./_build/prod/lib/xla/ebin/ DELETE Elixir.Mix.Tasks.Xla.Info.beam ./_build/prod/lib/xla/ebin/ DELETE xla.app <---- HERE ./_build/prod/lib/xla/ebin/ DELETE Elixir.XLA.beam ./_build/prod/lib/xla/ebin/ DELETE_SELF ./_build/prod/lib/xla/ DELETE,ISDIR ebin ./_build/prod/lib/xla/ OPEN,ISDIR .mix ./_build/prod/lib/xla/.mix/ OPEN,ISDIR ... ```

Second Run (success)

XLA_BUILD=true MIX_ENV=prod mix compile ``` $ XLA_BUILD=true MIX_ENV=prod mix compile ==> xla Compiling 2 files (.ex) Generated xla app make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date. ==> exla_compile_test Compiling 1 file (.ex) Generated exla_compile_test app ```

~The inotify logs are very long, so I am not posting it in here, but can attach it somewhere if necessary.~ See below

josevalim commented 2 years ago

Awesome @jnnks! Can you please post the 100 entries before and after the DELETE?

jnnks commented 2 years ago

Here are the entire logs :D

1st Run: https://gist.github.com/jnnks/88f2cda21064d0bb109a42ec4b701cb2 DELETE is at line 797

2nd Run: https://gist.github.com/jnnks/ad8a25419b3d84a6cef83b9892a926e3

josevalim commented 2 years ago

@jonatanklosko so this is caused by the explicit deps.compile xla alias inside EXLA. Do you remember why it is needed?

jonatanklosko commented 2 years ago

https://github.com/elixir-nx/nx/blob/2769f4a91ca9737b2d2ecbafb94671ad08ba1499/exla/mix.exs#L26-L29

Without that, xla is compiled once and changing XLA_TARGET has no effect, because the Makefile doesn't run again.

josevalim commented 2 years ago

I think we will have to remove the xla_build? check and tell them that setting it to true requires an explicit call to mix deps.compile xla. Another option is to move use config :xla, :force_build, true | false, because we can at least encode that it compile_env which can warn/raise if you change it and you don't recompile. But for now I would go with docs only. WDYT?

jonatanklosko commented 2 years ago

The config would only handle XLA_BUILD changing, but what if XLA_TARGET changes?

Updating the docs sounds good, though this change may cause some confusion for people relying on XLA_BUILD already.

josevalim commented 2 years ago

The issue is only with mix deps.compile xla and we only call it with XLA_BUILD is set. I will send a PR to make sure we are on the same page. :)