Closed guptam closed 4 years ago
Hello @guptam Thanks for the interest and the question. To better understand your problem, can you confirm if the GPUs are being used? Is that time consistent to what you get on Google Colab? You can change the verbosity flag and look at the error log file to see if anything is going wrong with accelerator usage.
While the notebook is a good example to see some predictions, there's a lot of overhead since every time you run the prediction cell a new python runtime is spun up, and the full model has to be loaded from the checkpoint. That's probably taking the most of those 60 seconds, so if you want to predict for multiple examples you probably want do it at once for all of the examples, which is already supported by the evaluation script.
Hi @eisenjulian. Thanks for a quick response.
I am not using Colab. TF is using both the GPUs. I will try separating the model loading and evaluation. I was just trying to do a quick test, and might have missed this part. Will create a new predict function and use a pre-loaded model.
Regards, Manish
Hi, @guptam I believe the main overhead for prediction comes from the loading the checkpoints as well as the creation of a file as a tf example and on further writing the predictions to a file. The current code in the google collab is actually running a task to predict by generating 2 files out of which one is an empty file. I guess moving away the creation of files and model loading would help you to get improved performance. Are you looking at a script for just prediction?
Hi @eldhosemjoy. Yes, only looking for a script to predict using an already loaded model.
Hi @guptam, a suggestion to get the prediction script implemented.
You will need to remove the dataset creation of file keeping proto to create features. collection. Convert those proto features using the already existing code on tapas/experiments/prediction_utils.py removing the tf session from lines 57 to 68. Generate features by converting them to (int32 - also change in the proto creation file i.e. tf_example_utils.py). Use the first example and send them to the separated estimator model function (def model_fn) from tapas/models/tapas_classifier_model.py on line 919 returning predictions. Use a session and graph to validate the prediction and implement the def write_predictions from tapas/experiments/prediction_utils.py to return back a JSON.
I hope this would help you to build a prediction script and would definitely improve the performance.
On a later part, you could use tf.placeholders for your features and invoke the model_fn using a session and graph.
I believe there are even better ways and would love to know.
Thanks
To add a clarification, the current script as used in the notebook should work as-is to predict on a large number of examples, with only a minimal change to dump all of your examples into a single tf_example file. Only when running it multiple times with just one example at a time is that there is a lot of overhead, both because the model is loaded again every time run_task_main.py
is run, and because size 1 batches are wasting GPU/TPU parallelism.
On the other hand if you want to load the model in the notebook, the easiest way would be to copy the content of the run_task_main.py file into a notebook as a starting point. The estimator object which is defined here contains the model in memory and can be used to train and/or predict.
I hope this helps, otherwise please give us more info to help us understand your use case.
@eisenjulian Absolutely. I had given the above option for a single example prediction on a realtime chat-based UI approach. To be more specific I guess @guptam was looking at more of leveraging the repo and the model for single predictions more towards a service/API level adaptation of the run_task_main.py if I have caught it right.
Thanks for the clarification. If what you are looking for is to have a service to do predictions in real time, there are a few alternatives:
@eldhosemjoy
Can you please share your approach , as i am also trying for the same . Would save a lot of time .
@monuminu, @guptam - I had tried to implement the prediction of the same as a service. You can have a look at the same -TAPAS Service Adaptation
Hi, @eldhosemjoy thanks for sharing your work but while trying to use your code on colab I am facing below issue
path of model:/content/temp/tapas/model/tapas_sqa_base.pb
---------------------------------------------------------------------------
DecodeError Traceback (most recent call last)
<ipython-input-14-c82d684a8d52> in <module>()
----> 1 tapaspredictor = TapasPredictor()
2 frames
<ipython-input-13-5e395447c7d6> in load_frozen_graph(self, frozen_graph_filename)
80 with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
81 graph_def = tf.GraphDef()
---> 82 graph_def.ParseFromString(f.read())
83
84 # Then, we import the graph_def into a new Graph and returns it
DecodeError: Error parsing message
@Akshaysharma29 you will need to do a git lfs pull or clone. The model is an LFS object.
@eldhosemjoy thanks for the quick response ok I will try it
git lfs clone https://github.com/eldhosemjoy/tapas.git This will have the model pulled in to the repository. @Akshaysharma29 You could take a look at this and you can directly run it from the directory - https://github.com/eldhosemjoy/tapas/blob/master/test/class_test.py
Thanks, @eldhosemjoy. it's working. which model version of SQA you have convert?
@Akshaysharma29 The SQA Base - https://storage.googleapis.com/tapas_models/2020_04_21/tapas_sqa_base.zip
Hi @eldhosemjoy , I'm also facing issue with the slow performance. In your repo TAPAS Service Adaptation, you have created a new class for prediction, where you are loading the saved model from config.json file. Can you hep me with where exactly in the repo you are saving the model?
git lfs clone https://github.com/eldhosemjoy/tapas.git This will have the model pulled in to the repository. @Akshaysharma29 You could take a look at this and you can directly run it from the directory - https://github.com/eldhosemjoy/tapas/blob/master/test/class_test.py
Hi @eldhosemjoy while running class_test.py I am getting this error question_id = example["question_id"][0, 0].decode("utf-8") TypeError: 'Example' object is not subscriptable
Hi,
Thanks for releasing this in open source. Wonderful concept and very different for other seq2seq or ln2sql like approaches.
I am facing performance issues when trying with sqa prediction notebook (using SQA Large). It takes more than 60 seconds on a dual gpu machine for evaluating the model and giving response to a query. Is this normal? How can we improve the prediction time?
Thanks, Manish