Indeed, Tensorflow==2.16.1 works even for me. I tried to read the source for tensorflow==2.17.0 and 2.16.1 to at least try find out what might be the issue. The following is what I found out:
The executed piece of code for registering the plugin cuFFT is located at the file:tensorflow-2.16.1/third_party/xla/xla/stream_executor/cuda/cuda_fft.c(you can change the tensorflow version # appropriately) and reproduced below:
void initialize_cufft() {
absl::Status status =
PluginRegistry::Instance()->RegisterFactory<PluginRegistry::FftFactory>(
cuda::kCudaPlatformId, "cuFFT",
[](internal::StreamExecutorInterface *parent) -> fft::FftSupport * {
gpu::GpuExecutor *cuda_executor =
dynamic_cast<gpu::GpuExecutor *>(parent);
if (cuda_executor == nullptr) {
LOG(ERROR) << "Attempting to initialize an instance of the cuFFT "
<< "support library with a non-CUDA StreamExecutor";
return nullptr;
}
return new gpu::CUDAFft(cuda_executor);
});
if (!status.ok()) {
LOG(ERROR) << "Unable to register cuFFT factory: " << status.message();
}
}
This function should be responsible for creating the PluginRegistry object defined in the file: tensorflow-2.16.1/third_party/xla/xla/stream_executor/plugin_registry.h. This object has a very important comment, reproduced below:
//The PluginRegistry is a singleton that maintains the set of registered
// "support library" plugins. Currently, there are four kinds of plugins:
// BLAS, DNN, and FFT. Each interface is defined in the corresponding
// gpu_{kind}.h header.
// Registers the specified factory with the specified platform.
// Returns a non-successful status if the factory has already been registered
// with that platform (but execution should be otherwise unaffected).
The class should be a Singleton, and even if it has been registered once an attempt to register it again will fail but tensorflow should work as expected.
And below is the function responsible for the registration, from the file: tensorflow-2.16.1/third_party/xla/xla/stream_executor/plugin_registry.cc:
template <typename FACTORY_TYPE>
absl::Status PluginRegistry::RegisterFactoryInternal(
const std::string& plugin_name, FACTORY_TYPE factory,
std::optional<FACTORY_TYPE>* factories) {
absl::MutexLock lock{&GetPluginRegistryMutex()};
if (factories->has_value()) {
return absl::AlreadyExistsError(
absl::StrFormat("Attempting to register factory for plugin %s when "
"one has already been registered",
plugin_name));
}
(*factories) = factory;
return absl::OkStatus();
}
I am not entirely sure as to when and where the very first object of cuFFT PluginRegistery is created for tensorflow to display this error. I believe there has to be a point from running import tensorflow and calling the above function initialize_cufft where the PluginRegistry object is created and since it must be a Singleton, hence the error. I hope someone can elaborate further on this, or provide better clarity.
Indeed, Tensorflow==2.16.1 works even for me. I tried to read the source for tensorflow==2.17.0 and 2.16.1 to at least try find out what might be the issue. The following is what I found out:
The executed piece of code for registering the plugin
cuFFT
is located at the file:tensorflow-2.16.1/third_party/xla/xla/stream_executor/cuda/cuda_fft.c
(you can change the tensorflow version # appropriately) and reproduced below:This function should be responsible for creating the
PluginRegistry
object defined in the file:tensorflow-2.16.1/third_party/xla/xla/stream_executor/plugin_registry.h
. This object has a very important comment, reproduced below:The class should be a Singleton, and even if it has been registered once an attempt to register it again will fail but tensorflow should work as expected.
And below is the function responsible for the registration, from the file:
tensorflow-2.16.1/third_party/xla/xla/stream_executor/plugin_registry.cc
:I am not entirely sure as to when and where the very first object of
cuFFT PluginRegistery
is created for tensorflow to display this error. I believe there has to be a point from runningimport tensorflow
and calling the above functioninitialize_cufft
where thePluginRegistry
object is created and since it must be aSingleton
, hence the error. I hope someone can elaborate further on this, or provide better clarity.