bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
https://bentoml.com
Apache License 2.0
7.1k stars 789 forks source link

Can't use KerasModelArtifact with store_as_json_and_weights=True due to TypeError #1560

Closed pietermarsman closed 3 years ago

pietermarsman commented 3 years ago

Describe the bug

I want to use KerasModelArtifact such that the name of the model is saved and restored. I also want to use store_as_json_and_weights=True to forget about the loss function and optimizer state (see #1559). However, when I fixate store_as_json_and_weight to True the model does not load properly and cannot be used for inference due to this error:

TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: conv2d_2/kernel:0

I think it has something to do with the tensorflow session and graph that is started and created when the model is loaded from json and weights.

To Reproduce

import logging
from tempfile import TemporaryDirectory

import bentoml
import numpy as np
from bentoml import BentoService
from bentoml.adapters import ImageInput, JsonOutput
from bentoml.frameworks.keras import KerasModelArtifact
from tensorflow.python.keras import Input
from tensorflow.python.keras.layers import Conv2D, Dense, Flatten
from tensorflow.python.keras.models import Sequential

bentoml.config().set('core', 'debug', 'true')
bentoml.configure_logging(logging.DEBUG)

class CustomKerasModelArtifact(KerasModelArtifact):
    """CustomKerasModel that uses store_as_json_and_weights=True"""

    def __init__(self, name):
        super().__init__(name)
        self._store_as_json_and_weights = True

@bentoml.artifacts([CustomKerasModelArtifact('model')])
class ClassificationInference(BentoService):
    @bentoml.api(input=ImageInput(), output=JsonOutput())
    def predict(self, image_array: np.array):
        """Predict class for image"""
        scores = self.artifacts.model(image_array).numpy()
        return scores

def test_bento_saving_and_serving():
    """Test if saving and loading with a custom object works"""
    model = Sequential([
        Input((299, 299, 3)),
        Conv2D(32, (3, 3), activation='relu'),
        Flatten(),
        Dense(2)
    ])
    model.compile()
    print(model.predict(np.random.rand(1, 299, 299, 3)))

    inference = ClassificationInference()
    inference.pack('model', model)

    with TemporaryDirectory() as tmp_dir:
        inference.save_to_dir(tmp_dir)
        bento: ClassificationInference = bentoml.load_from_dir(tmp_dir)  # noqa

        prediction = bento.predict(np.random.rand(299, 299, 3))
        print(prediction)

test_bento_saving_and_serving()

Expected behavior

Not raise an error :)

Screenshots/Logs

/home/pieter/projects/orbisk/classification-model-training/.venv/bin/python /home/pieter/.config/JetBrains/PyCharm2020.3/scratches/scratch_14.py
2021-04-01 00:01:59.480105: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-01 00:01:59.480124: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-04-01 00:02:01.900813: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-01 00:02:01.900983: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-04-01 00:02:01.900995: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-04-01 00:02:01.901013: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (satoru): /proc/driver/nvidia/version does not exist
2021-04-01 00:02:01.901200: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-01 00:02:01.901405: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-01 00:02:02.014027: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-04-01 00:02:02.033899: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2496000000 Hz
[[-0.07865936 -0.10319656]]
WARNING:tensorflow:From /home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/frameworks/keras.py:122: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

2021-04-01 00:02:02.250574: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[2021-04-01 00:02:03,205] INFO - BentoService bundle 'ClassificationInference:20210401000202_2315FF' created at: /tmp/tmp5vrthdsf
[[ 0.08861104 -0.11280265]]
[[-0.17932548 -0.6075488 ]]
[2021-04-01 00:02:06,170] INFO - BentoService bundle 'ClassificationInference:20210401000205_AEECFD' created at: /tmp/tmp3avrz_rk
2021-04-01 00:02:07.060813: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:From /home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/frameworks/keras.py:136: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

[2021-04-01 00:02:07,059] WARNING - Module `scratch_14` already loaded, using existing imported module.
2021-04-01 00:02:07.118528: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
Traceback (most recent call last):
  File "/home/pieter/.config/JetBrains/PyCharm2020.3/scratches/scratch_14.py", line 51, in <module>
    test_bento_saving_and_serving()
  File "/home/pieter/.config/JetBrains/PyCharm2020.3/scratches/scratch_14.py", line 45, in test_bento_saving_and_serving
    bento: ClassificationInference = bentoml.load_from_dir(tmp_dir)  # noqa
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/loader.py", line 107, in wrapper
    return func(bundle_path, *args)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/loader.py", line 258, in load_from_dir
    svc_cls = load_bento_service_class(bundle_path)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/loader.py", line 107, in wrapper
    return func(bundle_path, *args)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/loader.py", line 206, in load_bento_service_class
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/tmp/tmp5vrthdsf/ClassificationInference/scratch_14.py", line 51, in <module>
    test_bento_saving_and_serving()
  File "/tmp/tmp5vrthdsf/ClassificationInference/scratch_14.py", line 44, in test_bento_saving_and_serving
    inference.save_to_dir(tmp_dir)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/service/__init__.py", line 710, in save_to_dir
    return save_to_dir(self, path, version)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/bundler.py", line 228, in save_to_dir
    _write_bento_content_to_dir(bento_service, path)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/bundler.py", line 99, in _write_bento_content_to_dir
    module_name, module_file = copy_local_py_modules(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/local_py_modules.py", line 85, in copy_local_py_modules
    target_module = _get_module(target_module)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/bentoml/saved_bundle/local_py_modules.py", line 73, in _get_module
    target_module = importlib.import_module(target_module)
  File "/home/pieter/.pyenv/versions/3.8.6/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/tmp/tmp5vrthdsf/ClassificationInference/scratch_14.py", line 51, in <module>
    test_bento_saving_and_serving()
  File "/tmp/tmp5vrthdsf/ClassificationInference/scratch_14.py", line 47, in test_bento_saving_and_serving
    prediction = bento.predict(np.random.rand(299, 299, 3))
  File "/tmp/tmp5vrthdsf/ClassificationInference/scratch_14.py", line 25, in predict
    scores = self.artifacts.model(image_array).numpy()
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 831, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py", line 375, in call
    return super(Sequential, self).call(inputs, training=training, mask=mask)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 424, in call
    return self._run_internal_graph(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 560, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 831, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 248, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1013, in convolution_v2
    return convolution_internal(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1143, in convolution_internal
    return op(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 2597, in _conv2d_expanded_batch
    return gen_nn_ops.conv2d(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 936, in conv2d
    return conv2d_eager_fallback(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1019, in conv2d_eager_fallback
    _attr_T, _inputs_T = _execute.args_to_matching_eager([input, filter], ctx, [_dtypes.half, _dtypes.bfloat16, _dtypes.float32, _dtypes.float64, _dtypes.int32, ])
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 280, in args_to_matching_eager
    ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 280, in <listcomp>
    ret = [ops.convert_to_tensor(t, dtype, ctx=ctx) for t in l]
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1992, in _dense_var_to_tensor
    return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref)  # pylint: disable=protected-access
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1393, in _dense_var_to_tensor
    return self.value()
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 565, in value
    return self._read_variable_op()
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 672, in _read_variable_op
    result = read_and_set_handle()
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 662, in read_and_set_handle
    result = gen_resource_variable_ops.read_variable_op(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 478, in read_variable_op
    return read_variable_op_eager_fallback(
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 503, in read_variable_op_eager_fallback
    _result = _execute.execute(b"ReadVariableOp", 1, inputs=_inputs_flat,
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 75, in quick_execute
    raise e
  File "/home/pieter/projects/orbisk/classification-model-training/.venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: conv2d_2/kernel:0

Process finished with exit code 1

Environment:

Additional context

I fixed it by using:


class CustomKerasModelArtifact(KerasModelArtifact):
    def __init__(self, name):
        super().__init__(name)
        self._store_as_json_and_weights = True

    def save(self, dst):
        with open(self._model_json_path(dst), "w") as json_file:
            json_file.write(self._model.to_json())
        self._model.save_weights(self._model_weights_path(dst))

    def load(self, path):
        with open(self._model_json_path(path), 'r') as json_file:
            model_json = json_file.read()
        model = tf.keras.models.model_from_json(model_json)
        model.load_weights(self._model_weights_path(path))
        return self.pack(model)
parano commented 3 years ago

The root cause for this issue has been fixed in #1696 - however, there's actually another issue related to the store_as_json_and_weights option that's still open and being tracked here: https://github.com/bentoml/BentoML/issues/1698