Open CarloWood opened 1 year ago
All 632 (unique) symbols that are undefined start with _mlir_ciface_
*.
All 649 error lines containing 'undefined reference to' are of the following form:
^gpu_op_[a-z0-9_]*\.cc:(\.text\._Z[^+]*+0x[0-9a-f]*): undefined reference to `_mlir_ciface_[A-Za-z0-9_]*.$
showing that all undefined references come from files with a name like gpu_op_[a-z0-9_]*\.cc
.
All of which exclusively exist in build/tensorflow/tensorflow/core/kernels/mlir_generated/
.
196 of the errors are generated from gpu_op_cast.cc (the second one is gpu_op_relu.cc with 17 errors).
The only file with only a single error are gpu_op_logical_and.cc
, gpu_op_logical_not.cc
and gpu_op_logical_or.cc
.
These three files use GENERATE_BINARY_GPU_KERNEL and REGISTER_GPU_KERNEL_NO_TYPE_CONSTRAINT each once.
From which it seems that GENERATE_BINARY_GPU_KERNEL
and GENERATE_UNARY_GPU_KERNEL
--- OR REGISTER_GPU_KERNEL_NO_TYPE_CONSTRAINT
produces an error.
The files that generate two errors are: gpu_op_angle.cc
, gpu_op_complex_abs.cc
, gpu_op_complex.cc
, gpu_op_conj.cc
, gpu_op_imag.cc
, gpu_op_polygamma.cc
, gpu_op_real.cc
and gpu_op_zeta.cc
.
From which it seems that an error is produced by
REGISTER_COMPLEX_GPU_KERNEL
,
GENERATE_AND_REGISTER_UNARY_GPU_KERNEL
and
GENERATE_AND_REGISTER_BINARY_GPU_KERNEL
.
To make a long story short, it seems that the problem comes from the use of macros that use the macro MLIR_FUNCTION defined in tensorflow/tensorflow/core/kernels/mlir_generated/base_op.h
:
#define MLIR_FUNCTION(tf_op, platform, input_type, output_type) \
_mlir_ciface_##tf_op##_##platform##_##input_type##_##output_type
and well in particular:
GENERATE_UNARY_KERNEL3
, GENERATE_BINARY_KERNEL3
and GENERATE_TERNARY_KERNEL3
which are more or less similar, so l lets just look at one:
#define GENERATE_UNARY_KERNEL3(tf_op, platform, input_type, output_type, casted_input_type, casted_output_type)
which produces code like (I did some formatting):
extern "C" void MLIR_FUNCTION(tf_op, platform, input_type, output_type) // <-- Undefined reference.
(UnrankedMemRef * result, OpKernelContext * ctx, UnrankedMemRef * arg);
namespace {
class MLIR_OP(tf_op, platform, casted_input_type, casted_output_type) :
public MLIROpKernel<output_type, typename EnumToDataType<output_type>::Type, casted_output_type>
{
public:
using MLIROpKernel::MLIROpKernel;
UnrankedMemRef Invoke(OpKernelContext* ctx, llvm::SmallVectorImpl<UnrankedMemRef>& args) override
{
UnrankedMemRef result;
MLIR_FUNCTION(tf_op, platform, input_type, output_type)(&result, ctx, &args[0]); // <-- Undefined reference.
return result;
}
};
} // namespace
I found out it is an upstream problem. As of 2.14 they aren't linking with the (634 generated) bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/lib*_kernel_generator.pic.a
archives.
If you use bazel 6.1.0 it works. Then something else breaks, but this is a monologue anyway. Goodbye.
After two hours of compiling, the linking step fails! :(
and then
I just ran it again, so everything was already compiled and we go straight to linking again:
Can you please give me a hint, or ask me to test something?
Note that I made the following change:
This is the only thing I changed.