Slow performance - Githubissues

guptam commented 4 years ago

Hi,

Thanks for releasing this in open source. Wonderful concept and very different for other seq2seq or ln2sql like approaches.

I am facing performance issues when trying with sqa prediction notebook (using SQA Large). It takes more than 60 seconds on a dual gpu machine for evaluating the model and giving response to a query. Is this normal? How can we improve the prediction time?

Thanks, Manish

eisenjulian commented 4 years ago

Hello @guptam Thanks for the interest and the question. To better understand your problem, can you confirm if the GPUs are being used? Is that time consistent to what you get on Google Colab? You can change the verbosity flag and look at the error log file to see if anything is going wrong with accelerator usage.

While the notebook is a good example to see some predictions, there's a lot of overhead since every time you run the prediction cell a new python runtime is spun up, and the full model has to be loaded from the checkpoint. That's probably taking the most of those 60 seconds, so if you want to predict for multiple examples you probably want do it at once for all of the examples, which is already supported by the evaluation script.

guptam commented 4 years ago

Hi @eisenjulian. Thanks for a quick response.

I am not using Colab. TF is using both the GPUs. I will try separating the model loading and evaluation. I was just trying to do a quick test, and might have missed this part. Will create a new predict function and use a pre-loaded model.

Regards, Manish

eldhosemjoy commented 4 years ago

Hi, @guptam I believe the main overhead for prediction comes from the loading the checkpoints as well as the creation of a file as a tf example and on further writing the predictions to a file. The current code in the google collab is actually running a task to predict by generating 2 files out of which one is an empty file. I guess moving away the creation of files and model loading would help you to get improved performance. Are you looking at a script for just prediction?

guptam commented 4 years ago

Hi @eldhosemjoy. Yes, only looking for a script to predict using an already loaded model.

eldhosemjoy commented 4 years ago

Hi @guptam, a suggestion to get the prediction script implemented.

You will need to remove the dataset creation of file keeping proto to create features. collection. Convert those proto features using the already existing code on tapas/experiments/prediction_utils.py removing the tf session from lines 57 to 68. Generate features by converting them to (int32 - also change in the proto creation file i.e. tf_example_utils.py). Use the first example and send them to the separated estimator model function (def model_fn) from tapas/models/tapas_classifier_model.py on line 919 returning predictions. Use a session and graph to validate the prediction and implement the def write_predictions from tapas/experiments/prediction_utils.py to return back a JSON.

I hope this would help you to build a prediction script and would definitely improve the performance.

On a later part, you could use tf.placeholders for your features and invoke the model_fn using a session and graph.

I believe there are even better ways and would love to know.

Thanks

eisenjulian commented 4 years ago

To add a clarification, the current script as used in the notebook should work as-is to predict on a large number of examples, with only a minimal change to dump all of your examples into a single tf_example file. Only when running it multiple times with just one example at a time is that there is a lot of overhead, both because the model is loaded again every time run_task_main.py is run, and because size 1 batches are wasting GPU/TPU parallelism.

On the other hand if you want to load the model in the notebook, the easiest way would be to copy the content of the run_task_main.py file into a notebook as a starting point. The estimator object which is defined here contains the model in memory and can be used to train and/or predict.

I hope this helps, otherwise please give us more info to help us understand your use case.

eldhosemjoy commented 4 years ago

@eisenjulian Absolutely. I had given the above option for a single example prediction on a realtime chat-based UI approach. To be more specific I guess @guptam was looking at more of leveraging the repo and the model for single predictions more towards a service/API level adaptation of the run_task_main.py if I have caught it right.

eisenjulian commented 4 years ago

Thanks for the clarification. If what you are looking for is to have a service to do predictions in real time, there are a few alternatives:

Create the estimator object as I commented before and predict when needed on a webserver. This is not the recommended approach for production usecases.
Export the a SavedModel from the estimator https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators and load it with a TensorFlow Serving server. That service receives serialized tf_examples, so you will need to create the tf_examples before calling it, just as it's done in the notebook, which you will likely need to do in a python server that loads the vocabulary file and the tapas lib.

monuminu commented 4 years ago

@eldhosemjoy

Can you please share your approach , as i am also trying for the same . Would save a lot of time .

eldhosemjoy commented 4 years ago

@monuminu, @guptam - I had tried to implement the prediction of the same as a service. You can have a look at the same -TAPAS Service Adaptation

Akshaysharma29 commented 4 years ago

Hi, @eldhosemjoy thanks for sharing your work but while trying to use your code on colab I am facing below issue

path of model:/content/temp/tapas/model/tapas_sqa_base.pb
---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
<ipython-input-14-c82d684a8d52> in <module>()
----> 1 tapaspredictor  = TapasPredictor()

2 frames
<ipython-input-13-5e395447c7d6> in load_frozen_graph(self, frozen_graph_filename)
     80     with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
     81         graph_def = tf.GraphDef()
---> 82         graph_def.ParseFromString(f.read())
     83 
     84     # Then, we import the graph_def into a new Graph and returns it

DecodeError: Error parsing message

eldhosemjoy commented 4 years ago

@Akshaysharma29 you will need to do a git lfs pull or clone. The model is an LFS object.

Akshaysharma29 commented 4 years ago

@eldhosemjoy thanks for the quick response ok I will try it

eldhosemjoy commented 4 years ago

git lfs clone https://github.com/eldhosemjoy/tapas.git This will have the model pulled in to the repository. @Akshaysharma29 You could take a look at this and you can directly run it from the directory - https://github.com/eldhosemjoy/tapas/blob/master/test/class_test.py

Akshaysharma29 commented 4 years ago

Thanks, @eldhosemjoy. it's working. which model version of SQA you have convert?

eldhosemjoy commented 4 years ago

@Akshaysharma29 The SQA Base - https://storage.googleapis.com/tapas_models/2020_04_21/tapas_sqa_base.zip

rahulyadav02 commented 3 years ago

Hi @eldhosemjoy , I'm also facing issue with the slow performance. In your repo TAPAS Service Adaptation, you have created a new class for prediction, where you are loading the saved model from config.json file. Can you hep me with where exactly in the repo you are saving the model?

TheurgicDuke771 commented 3 years ago

git lfs clone https://github.com/eldhosemjoy/tapas.git This will have the model pulled in to the repository. @Akshaysharma29 You could take a look at this and you can directly run it from the directory - https://github.com/eldhosemjoy/tapas/blob/master/test/class_test.py

Hi @eldhosemjoy while running class_test.py I am getting this error question_id = example["question_id"][0, 0].decode("utf-8") TypeError: 'Example' object is not subscriptable

google-research / tapas

Slow performance #8