Building tensorflow lite GPU

barrypitman commented 3 weeks ago

Hi,

I'm trying to build tensorflow lite for linux-x86_64 with the -gpu extension. I figured that the easiest way would be to use GitHub actions and just modify some of the workflow files, which I've done here - https://github.com/bytedeco/javacpp-presets/compare/master...barrypitman:javacpp-presets:v1.5.10-GPU

I was initially able to build the tensorflow-lite-2.15.0-1.5.10-linux-x86_64-gpu.jar file by passing ext=-gpu. The resulting libjnitensorflowlite.so file is larger than the default one without GPU support (seems like a good thing).

However, when I try to include that tensorflow-lite-2.15.0-1.5.10-linux-x86_64-gpu.jar file in my project as a dependency, I can't call create the GPU delegate, e.g. "TfLiteGpuDelegateV2Create" as described here - https://www.tensorflow.org/lite/android/delegates/gpu_native#enable_gpu_acceleration. The class doesn't exist.

Then I tried to link the relevant header files to generate the necessary java classes, i.e.

@Platform(
                value = {"android", "linux"},
                extension = "-gpu",
                define = "UNIQUE_PTR_NAMESPACE std",
                include = {
                        "tensorflow/lite/delegates/gpu/delegate.h",
                        "tensorflow/lite/delegates/gpu/delegate_options.h",
                }

But that caused the build to fail with a lot of compilation errors - https://github.com/barrypitman/javacpp-presets/actions/runs/10420729781/job/28861326724

Any tips or pointers for how to build the linux-x86_64 version of tensorflow lite with GPU support would be appreciated!

Thanks

barrypitman commented 3 weeks ago

OK, after reading the docs I see that the list of header files in @Platform for the -gpu extension is not added to the existing list, I should re-declare all of them.

Having done that, it seems to be much closer to compiling.

There are some errors like this:

/home/runner/work/javacpp-presets/javacpp-presets/tensorflow-lite/target/native/org/bytedeco/tensorflowlite/linux-x86_64-gpu/jnitensorflowlite.cpp:35257:21: error: ‘struct TfLiteGpuDelegateOptionsV2’ has no member named ‘first_delegate_node_index’
35257 |     int rval = ptr->first_delegate_node_index;

and

/home/runner/work/javacpp-presets/javacpp-presets/tensorflow-lite/target/native/org/bytedeco/tensorflowlite/linux-x86_64-gpu/jnitensorflowlite.cpp:2385:681: error: ‘struct TfLiteGpuDelegateOptionsV2’ has no member named ‘last_delegate_node_index’
 2385 |         { sizeof(TfLiteGpuDelegateOptionsV2), offsetof(TfLiteGpuDelegateOptionsV2, model_token), offsetof(TfLiteGpuDelegateOptionsV2, is_precision_loss_allowed), offsetof(TfLiteGpuDelegateOptionsV2, inference_preference), offsetof(TfLiteGpuDelegateOptionsV2, inference_priority1), offsetof(TfLiteGpuDelegateOptionsV2, inference_priority2), offsetof(TfLiteGpuDelegateOptionsV2, inference_priority3), offsetof(TfLiteGpuDelegateOptionsV2, experimental_flags), offsetof(TfLiteGpuDelegateOptionsV2, max_delegated_partitions), offsetof(TfLiteGpuDelegateOptionsV2, serialization_dir), offsetof(TfLiteGpuDelegateOptionsV2, first_delegate_node_index), offsetof(TfLiteGpuDelegateOptionsV2, last_delegate_node_index) },

I see that the above two members are defined inside this block in tensorflow/lite/delegates/gpu/delegate_options.h:

#ifdef TFLITE_DEBUG_DELEGATE
  // This sets the index of the first node that could be delegated.
  int first_delegate_node_index;
  // This sets the index of the last node that could be delegated.
  int last_delegate_node_index;
#endif

But I'm not sure how to work around this issue, any pointers would be appreciated.

and the other error is this:

/home/runner/work/javacpp-presets/javacpp-presets/tensorflow-lite/target/native/org/bytedeco/tensorflowlite/linux-x86_64-gpu/jnitensorflowlite.cpp: In function ‘_jobject* Java_org_bytedeco_tensorflowlite_global_tensorflowlite_TfLiteGpuDelegateV2CreateAsync(JNIEnv*, jclass, jobject)’:
/home/runner/work/javacpp-presets/javacpp-presets/tensorflow-lite/target/native/org/bytedeco/tensorflowlite/linux-x86_64-gpu/jnitensorflowlite.cpp:48586:16: error: ‘TfLiteGpuDelegateV2CreateAsync’ was not declared in this scope; did you mean ‘TfLiteGpuDelegateV2Create’?
48586 |         rptr = TfLiteGpuDelegateV2CreateAsync((const TfLiteGpuDelegateOptionsV2*)ptr0);

Which seems simliar...TfLiteGpuDelegateV2CreateAsync is defined in tensorflow/lite/delegates/gpu/delegate.h like this:

#if defined(__ANDROID__)
TFL_CAPI_EXPORT TfLiteDelegate* TfLiteGpuDelegateV2CreateAsync(
    const TfLiteGpuDelegateOptionsV2* options);
#endif

Any tips for fixing those?

Thanks in advance

saudet commented 3 weeks ago

OK, after reading the docs I see that the list of header files in @platform for the -gpu extension is not added to the existing list, I should re-declare all of them.

You mean the include list? We can manipulate it later on with LoadEnabled.init() like this: https://github.com/bytedeco/javacpp-presets/blob/master/pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java#L143

Any tips for fixing those?

We can skip anything problematic like that rather easily: https://github.com/bytedeco/javacpp/wiki/Mapping-Recipes#defining-macros-and-controlling-their-blocks https://github.com/bytedeco/javacpp/wiki/Mapping-Recipes#skipping-lines-from-header-files

saudet commented 3 weeks ago

OK, after reading the docs I see that the list of header files in @platform for the -gpu extension is not added to the existing list, I should re-declare all of them.

You mean the include list? We can manipulate it later on with LoadEnabled.init() like this: https://github.com/bytedeco/javacpp-presets/blob/master/pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java#L143

Actually, no, we want the interface to be the same for all platforms, so just add the header file to the include list for all platforms.

saudet commented 3 weeks ago

But for functions that are not actually there to link with, we can annotate them with something like @Platform(extension="-gpu") like this: https://github.com/bytedeco/javacpp-presets/blob/master/onnxruntime/src/main/java/org/bytedeco/onnxruntime/presets/onnxruntime.java#L229

barrypitman commented 3 weeks ago

I have got it compiling and generating the tensorflow-lite-linux-x86-64-gpu.jar file, and I'm able to create the GPU delegate using ModifyGraphWithDelegate + TfLiteGpuDelegateV2Create. I have run into some issues actually using my NVIDIA GPU from within my Windows WSL2 installation though. In particular, it seems like tflite is using OpenCL to interact with the GPU, and getting OpenCL + NVIDIA GPUs working together under WSL2 is a problem - https://github.com/microsoft/WSL/issues/6951.

I was able to get tflite to use my integrated Intel GPU via OpenCL, so I don't think that its an issue with javacpp/tensorflowlite, but rather the execution environment.

I also followed the guide here - https://medium.com/@tackboon97_98523/how-to-install-opencl-on-wsl-ubuntu-to-detect-a-cuda-gpu-device-30f334a415ec to get the NVIDIA GPU working with OpenCL via POCL, but when running my application I end up with messages like this:

INFO: Created TensorFlow Lite delegate for GPU.
INFO: Loaded OpenCL library with dlopen.
ERROR: Following operations are not supported by GPU delegate:
CUSTOM TFLite_Detection_PostProcess: TFLite_Detection_PostProcess
DEQUANTIZE:
98 operations will run on the GPU, and the remaining 146 operations will run on the CPU.
INFO: Loaded OpenCL library with dlopen.
ERROR: Failed to create 2D texture (clCreateImage): Invalid operation
ERROR: Falling back to OpenGL
ERROR: TfLiteGpuDelegate Init: OpenGL-based API disabled
INFO: Created 0 GPU delegate kernels.
ERROR: TfLiteGpuDelegate Prepare: delegate is not initialized
ERROR: Node number 244 (TfLiteGpuDelegateV2) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.

Anyway, I think that I might park this for now, and investigate using ONNX Runtime instead.

Would you be interested in building a GPU-enabled version of TFLite by default as part of your normal build? In that case I will try to clean up what I've done and submit a PR.

saudet commented 3 weeks ago

Sure, please open a PR with what you got! Thanks

bytedeco / javacpp-presets

Building tensorflow lite GPU #1529