TensorFlow warnings about unnecessary retracing

faustomorales / keras-ocr

A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

https://keras-ocr.readthedocs.io/

MIT License

1.39k stars 356 forks source link

TensorFlow warnings about unnecessary retracing #63

Open Dobiasd opened 4 years ago

Dobiasd commented 4 years ago

The following minimal example (main.py)

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

image_urls = [
    "https://i.imgur.com/euIw5Dt.png",
    "https://i.imgur.com/fAT6keX.png",
    "https://i.imgur.com/RlxBrvX.png",
    "https://i.imgur.com/pWBX9z5.png",
    "https://i.imgur.com/tzfitxz.png",
    "https://i.imgur.com/VPPpRJg.png"
]

for image_url in image_urls:
    image = keras_ocr.tools.read(image_url)
    predictions = pipeline.recognize([image])

produces there TensorFlow warnings:

WARNING:tensorflow:5 out of the last 5 calls to <function _make_execution_function.<locals>.distributed_function at 0x7f0284309cb0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:6 out of the last 6 calls to <function _make_execution_function.<locals>.distributed_function at 0x7f0284309cb0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

It seems this can negatively impact performance.

The situation can be reproduced by running

docker build -t deleteme .

with the following Dockerfile:

FROM python:3.7

ENV CUDA_VISIBLE_DEVICES="-1"
RUN pip install tensorflow==2.1.0 keras-ocr==0.8.3

# Disable the Docker cache from this stage on, see https://stackoverflow.com/a/58801213/1866775
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

ADD ./main.py /
RUN python /main.py

faustomorales commented 4 years ago

Thanks so much for reporting this issue!

You are getting this warning because pipeline.recognize uses a two-step process where images are first passed through the detector and then the cropped word boxes are passed through the recognizer. In your code, you are passing images in one at a time, which means that inference takes place as follows:

Detect text on image 1
Recognize text on boxes from image 1
Detect text on image 2
Recognize text on boxes from image 2
Detect text on image 3
Recognize text on boxes from image 3
...

TensorFlow is having to flip back and forth between the detector and recognizer using a batch size of 1, which means that every call ends up requiring retracing (the details of this are hazy to me, to be honest. but that's what I've surmised from dealing with this warning in the past).

To be more efficient, pipeline.recognize will optimize inference to perform the first step (detection) for all the images and then performing the second step (recognition) for all the word boxes. You can take advantage of this by batching according to the pattern suggested in the README, which loads a batch of images first and then passes that batch to pipeline.recognize.

This means that, instead of the above, inference can take place as follows:

Detect text on images 1, 2, 3, 4, ...
Recognize text on boxes for images 1, 2, 3, 4 ...

I believe this altered version of your code will eliminate the warnings (and also get you much better performance).

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

image_urls = [
    "https://i.imgur.com/euIw5Dt.png",
    "https://i.imgur.com/fAT6keX.png",
    "https://i.imgur.com/RlxBrvX.png",
    "https://i.imgur.com/pWBX9z5.png",
    "https://i.imgur.com/tzfitxz.png",
    "https://i.imgur.com/VPPpRJg.png"
]

images = [keras_ocr.tools.read(image_url) for image_url in image_urls]
predictions = pipeline.recognize(images)

This resolves the issue on my machine but please do let me know if the warning persists on your end. Again, thanks for reporting the issue and for giving keras-ocr a try!

Dobiasd commented 4 years ago

Thanks a lot for the quick and well-written response. It matches the overall high quality of your project, which is very refreshing compared to many other repositories in the ML zoo. They often are quite cumbersome to understand/use/integrate. keras-ocr, on the other hand, is very convenient to use and well documented. :+1:

Yes, the code using batching does eliminate the warnings. The thing is, my use-case does not depend on maximum throughput, but on minimum latency, and the images come in 1-by-1, so I have to process each one immediately.

To measure how much more runtime per image I have to accept for this, I just wrote the following mini benchmark:

import time

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

images = [
    keras_ocr.tools.read(url) for url in [
        'https://i.imgur.com/euIw5Dt.png',
        'https://i.imgur.com/fAT6keX.png',
        'https://i.imgur.com/RlxBrvX.png',
        'https://i.imgur.com/pWBX9z5.png',
        'https://i.imgur.com/tzfitxz.png',
        'https://i.imgur.com/VPPpRJg.png'
    ]
]

def measure(f):
    start = time.time()
    f()
    duration = time.time() - start
    print(f'{f.__name__.ljust(10)}: {duration:.2f} s', flush=True)

def one_by_one():
    return [pipeline.recognize([image]) for image in images]

def batch():
    return pipeline.recognize(images)

for _ in range(3):
    measure(one_by_one)
    measure(batch)

Output (on my machine, no matter if ran inside Docker or outside of it):

one_by_one: 24.96 s
batch     : 53.37 s
one_by_one: 22.06 s
batch     : 52.83 s
one_by_one: 22.30 s
batch     : 52.68 s

So it seems like the one_by_one version actually is faster compared to the batch version. This, of course, does not match what we expect to see. Do you get similar results (without GPU, i.e., with CUDA_VISIBLE_DEVICES="-1")?

Dobiasd commented 4 years ago

OK, same as with issue 65, the warnings are not a problem caused by keras-ocr, but one coming from TensorFlow. They introduced this problem between versions 2.0.1 and 2.1.0 (see tensorflow/issues/38598). When I use TF 2.0.1, everything is fine, i.e., there are no warnings. :+1:

The one_by_one version still (TF 2.0.1) is faster compared to the batch version in the experiment. It's still surprising, but not an actual problem, so I'll close this issue here. Thanks again. :slightly_smiling_face:

faustomorales commented 4 years ago

First, thanks for all the kind words above! And thank you for your patience -- I usually don't have the cycles to provide thoughtful answers to questions until the weekend so I'm glad you've been able to get most of what you need by other means

I think the the counterintuitive results are caused by the way we do batching (see below). https://github.com/faustomorales/keras-ocr/blob/9dd79f345a453adeb196fba1e21d6d4bd1aac442/keras_ocr/pipeline.py#L43-L50

We first upscale all the images according to the pipeline.scale parameter. Then, because batching requires all images to be the same size, we pad all the images to the size of the largest image. So if you have 5 small images and 1 large one, you end up paying a premium for the smaller images in that batch. On a GPU, the balance comes out in favor of batching. On a CPU, though, it appears to go the other way depending on the relative sizes of the images in the batch.

I've found it difficult to balance all the edge cases and provide the right facility for users to choose the best option for them. If, in your use of the library, you find new / better ways to handle the trade-offs, please don't hesitate to propose them or file a PR. For example, perhaps we should batch images of similar size together somehow? But then that would complicate things somewhat and cause even more surprising behavior. I definitely don't know the right answer. :/

Again, thanks for the feedback and questions!

Dobiasd commented 4 years ago

Ah, thanks for the explanation. The padding thing totally makes sense.

I would not try to provide some super-smart batching logic in the library. If the user knows about what you just explained, they can decide on their own which images to batch or not. The fact that the way to get the best performance depends on the user's system is another indicator that this should be a user-land decision.

Dobiasd commented 4 years ago

Then, because batching requires all images to be the same size, we pad all the images to the size of the largest image.

Wouldn't it then make sense to actually run just one predict call on the detection model?

Currently, it seems to be done one-by-one, basically like this:

for image in images:
    [...]
    self.model.predict(image[np.newaxis], [...]),

Instead, I expected something like that:

self.model.predict(images)

faustomorales commented 4 years ago

You are exactly right! I've just pushed 95f6209 to fix this.

jasdeep06 commented 4 years ago

@faustomorales training in batches also seems to give this retracing warning.IMHO- Currently the images are being resized(padded) to the maximum height and width of that particular batch.When the images are different in size across batches(as is case in my application),this maximum height/width would be different for different batches.As you can see here -

"On the other hand, TensorFlow graphs require static dtypes and shape dimensions. tf.function bridges this gap by retracing the function when necessary to generate the correct graphs. Most of the subtlety of tf.function usage stems from this retracing behavior."

As the shape of resized images change in every batch,tf has to retrace graph taking 10x time to run model.predict() in detection module.

faustomorales commented 4 years ago

This is really interesting and a great point. Thank you for sharing the documentation reference. Do you have thoughts on how to resolve? Perhaps we should provide an option for a fixed resize?

On Fri, Jun 12, 2020 at 5:51 AM Jasdeep Singh Chhabra < notifications@github.com> wrote:

@faustomorales https://github.com/faustomorales training in batches also seems to give this retracing warning.IMHO- Currently the images are being resized(padded) to the maximum height and width of that particular batch.When the images are different in size across batches(as is case in my application),this maximum height/width would be different for different batches.As you can see here https://www.tensorflow.org/guide/function -

"On the other hand, TensorFlow graphs require static dtypes and shape dimensions. tf.function bridges this gap by retracing the function when necessary to generate the correct graphs. Most of the subtlety of tf.function usage stems from this retracing behavior."

As the shape of resized images change in every batch,tf has to retrace graph taking 10x time to run model.predict() in detection module.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/faustomorales/keras-ocr/issues/63#issuecomment-643208032, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKIZ2JXV3SRFXVTFYVOTFDRWICC3ANCNFSM4MHPUU4Q .

jasdeep06 commented 4 years ago

@faustomorales If you dig further into documentation of tf.function,here and look at the "Input signatures",it states the following-

"An "input signature" can be optionally provided to tf.function to control the graphs traced. The input signature specifies the shape and type of each Tensor argument to the function using a tf.TensorSpec object. More general shapes can be used. This is useful to avoid creating multiple graphs when Tensors have dynamic shapes."

Thus,specifying input signature with shape [batch_size,None,None,3] should solve the retracing problem.The tricky part here is,behind the scene,on specifying tf.function signature tensorflow converts the underlying function to a graph.This graph works best when everything a tensorflow op.(preprocessing + predict) which is not true for this repository.{Even if every op is not a tensorflow op,IMHO the signature would still solve the retracing problem although it will be comparatively slower.}

Moreover tf.function signature needs to sit on call method of our model which is only available if the model is written using subclassing.In this repository functional API is used which does not expose the call method.I also tried to add tf.function signature to predict function but it was not supported as mentioned here.

IMHO,rewriting model using subclassing,adding tf.function along with input signature on the call method and calling prediction as model(....) instead of model.predict(...) would enable inference on different sized batches without retracing.Otherwise an option for a fixed resize can always be provided taking the option of multi sized inference away.

faustomorales commented 4 years ago

Re-opening because we probably ought to fix this so we can get improved inference speed. Not sure when I'll be able to get to it but PRs are welcome!

zaheerbeg21 commented 3 years ago

how can we use our own images on this model? I mean from local machine or google drive images,Kindly suggest something