Update to 0.11.0 causing exceptions

henryruhs commented 1 year ago

Our low level implementation is causing an error once we update to the 0.11.0 release.

Code

def predict_frame(target_frame : Frame) -> bool:
    image = Image.fromarray(target_frame)
    image = opennsfw2.preprocess_image(image, opennsfw2.Preprocessing.YAHOO)
    views = numpy.expand_dims(image, axis = 0)
    _, probability = get_predictor().predict(views)[0]
    return probability > MAX_PROBABILITY

Source: https://github.com/facefusion/facefusion/blob/master/facefusion/predictor.py

Error

Traceback (most recent call last):
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/gradio/routes.py", line 523, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
    result = await self.call_function(
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1109, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/gradio/utils.py", line 865, in wrapper
    response = f(*args, **kwargs)
  File "/home/henry/PycharmProjects/facefusion/facefusion/uis/components/preview.py", line 95, in update_preview_image
    preview_frame = process_preview_frame(target_frame)
  File "/home/henry/PycharmProjects/facefusion/facefusion/uis/components/preview.py", line 118, in process_preview_frame
    if predict_frame(temp_frame):
  File "/home/henry/PycharmProjects/facefusion/facefusion/predictor.py", line 33, in predict_frame
    _, probability = get_predictor().predict(views)[0]
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/keras_core/src/utils/traceback_utils.py", line 123, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall' defined at (most recent call last):
    File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
      self._bootstrap_inner()
    File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
      self.run()
    File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
      result = context.run(func, *args)
    File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/gradio/utils.py", line 865, in wrapper
      response = f(*args, **kwargs)
    File "/home/henry/PycharmProjects/facefusion/facefusion/uis/components/preview.py", line 95, in update_preview_image
      preview_frame = process_preview_frame(target_frame)
    File "/home/henry/PycharmProjects/facefusion/facefusion/uis/components/preview.py", line 118, in process_preview_frame
      if predict_frame(temp_frame):
    File "/home/henry/PycharmProjects/facefusion/facefusion/predictor.py", line 33, in predict_frame
      _, probability = get_predictor().predict(views)[0]
    File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/keras_core/src/utils/traceback_utils.py", line 118, in error_handler
      return fn(*args, **kwargs)
    File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/keras_core/src/backend/tensorflow/trainer.py", line 504, in predict
      batch_outputs = self.predict_function(data)
    File "/home/henry/PycharmProjects/facefusion/venv/lib/python3.10/site-packages/keras_core/src/backend/tensorflow/trainer.py", line 210, in one_step_on_data_distributed
      outputs = self.distribute_strategy.run(
Node: 'StatefulPartitionedCall'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall}}]] [Op:__inference_one_step_on_data_distributed_4826]

bhky commented 1 year ago

Thanks for the message!

Just by glancing your predictor.py module, one guess is you are using keras but not keras_core. Maybe you could give a try as well.

Other than that, I cannot reproduce the error.

If the above does not help, could you provide a minimal example that can reproduce it?

EDIT: Ah, no, your keras import is only for the Model typing. Then it should not be related. In that case, please provide an example. Thanks a lot!

henryruhs commented 1 year ago

I can just copy / paste your example to reproduce the issue: https://github.com/bhky/opennsfw2#lower-level-with-keras-core

henryruhs commented 1 year ago

Just a side note: move the install bash scripting to .github/workflow ... I got pretty much confused finding this in tests. General speaking, if you need a wrapper to run your testing suite - something is off. A pytest with pytest-env should do it in any case...

bhky commented 1 year ago

Ummm, there is no issue when I run it on both CPU or GPU.

Judging from your error messages, it looks like something related to this, i.e., is it something related to your CUDA setup?

henryruhs commented 1 year ago

That does actual fix it: os.environ['XLA_FLAGS'] = '--xla_gpu_cuda_data_dir=/usr/local/cuda-11'

bhky commented 1 year ago

Good to hear! So it's the CUDA setup of the local environment.

And regarding your issue with the test setup, in fact you can just use unittest to run my test file, it's quite standalone there.

bhky / opennsfw2

Update to 0.11.0 causing exceptions #17

Code

Error