Humbulani1234 / django-default

Django Probability of Default
1 stars 1 forks source link

Tensorflow errors #46

Open Humbulani1234 opened 1 month ago

Humbulani1234 commented 1 month ago

Indeed, Tensorflow==2.16.1 works even for me. I tried to read the source for tensorflow==2.17.0 and 2.16.1 to at least try find out what might be the issue. The following is what I found out:

The executed piece of code for registering the plugin cuFFT is located at the file:tensorflow-2.16.1/third_party/xla/xla/stream_executor/cuda/cuda_fft.c(you can change the tensorflow version # appropriately) and reproduced below:


void initialize_cufft() {
  absl::Status status =
      PluginRegistry::Instance()->RegisterFactory<PluginRegistry::FftFactory>(
          cuda::kCudaPlatformId, "cuFFT",
          [](internal::StreamExecutorInterface *parent) -> fft::FftSupport * {
            gpu::GpuExecutor *cuda_executor =
                dynamic_cast<gpu::GpuExecutor *>(parent);
            if (cuda_executor == nullptr) {
              LOG(ERROR) << "Attempting to initialize an instance of the cuFFT "
                         << "support library with a non-CUDA StreamExecutor";
              return nullptr;
            }

            return new gpu::CUDAFft(cuda_executor);
          });
  if (!status.ok()) {
    LOG(ERROR) << "Unable to register cuFFT factory: " << status.message();
  }
}

This function should be responsible for creating the PluginRegistry object defined in the file: tensorflow-2.16.1/third_party/xla/xla/stream_executor/plugin_registry.h. This object has a very important comment, reproduced below:

//The PluginRegistry is a singleton that maintains the set of registered
// "support library" plugins. Currently, there are four kinds of plugins:
// BLAS, DNN, and FFT. Each interface is defined in the corresponding
// gpu_{kind}.h header.

// Registers the specified factory with the specified platform.
 // Returns a non-successful status if the factory has already been registered
 // with that platform (but execution should be otherwise unaffected).

The class should be a Singleton, and even if it has been registered once an attempt to register it again will fail but tensorflow should work as expected.

And below is the function responsible for the registration, from the file: tensorflow-2.16.1/third_party/xla/xla/stream_executor/plugin_registry.cc:


template <typename FACTORY_TYPE>
absl::Status PluginRegistry::RegisterFactoryInternal(
    const std::string& plugin_name, FACTORY_TYPE factory,
    std::optional<FACTORY_TYPE>* factories) {
  absl::MutexLock lock{&GetPluginRegistryMutex()};

  if (factories->has_value()) {
    return absl::AlreadyExistsError(
        absl::StrFormat("Attempting to register factory for plugin %s when "
                        "one has already been registered",
                        plugin_name));
  }

  (*factories) = factory;
  return absl::OkStatus();
}

I am not entirely sure as to when and where the very first object of cuFFT PluginRegistery is created for tensorflow to display this error. I believe there has to be a point from running import tensorflow and calling the above function initialize_cufft where the PluginRegistry object is created and since it must be a Singleton, hence the error. I hope someone can elaborate further on this, or provide better clarity.

Y055513 commented 1 month ago

AttributeError: module 'tensorflow.python.data.ops.from_tensor_slices_op' has no attribute '_TensorSliceDataset' how to solve it?