Open pstjohn opened 4 years ago
Started experimenting with tf_serving and batching (with ysi example so far).
SINGULARITYENV_MODEL_NAME=ysi_model singularity exec --nv -B ./ysi_model:/models/ysi_model -B ./batch.config:/models/batch.config /projects/rlmolecule/pstjohn/containers/tensorflow-serving-gpu.simg tf_serving_entrypoint.sh --enable_batching --batching_parameters_file=/models/batch.config
My batch.config
looks like this (unoptimized; base on an example found online):max_batch_size { value: 16 } batch_timeout_micros { value: 100000 } max_enqueued_batches { value: 1000000 } num_batch_threads { value: 4 } pad_variable_length_inputs: true
Still not sure about the last line in this config. Without it, I get an error though ("Tensors with name 'serving_default_atom:0' from different tasks have different shapes and padding is turned off.Set pad_variable_length_inputs to true, or ensure that all tensors with the same name have equal dimensions starting with the first dim.").
When I run run_tf_serving-gpu.sh
, I see the log message: "Wrapping session to perform batch processing" -- an indication that it is indeed running in the batching mode.
I am currently trying to see why the responses I get from tf_serving don't match results from the local model. I'm not sure if this padding is causing it or I'm missing something else. I'd like to take a closer look at it with somebody who has some available cycles.
Another observation: when we call "tf.keras.models.save_model()", we will need to keep incrementing the version model so that tf_serving can unload the old model and load the new one (it looks like it monitors the directory with all models where the models are the directories numbered from the earliest to the latest and it loads the latest one). How are we planning to "auto-increment" the model ID? Is it going to be literally: check what is in the directory now -> ID=highest number + 1 -> save the model with this ID?
This page has info on optimizing batching: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md
Another useful discussion with some examples: https://github.com/tensorflow/serving/issues/344
Looks like we'll need to either patch tf-serving or figure out a way to put a padded atom first in each input: tensorflow/serving#1279
Modifying our code, we could probably add a 'pad' atom, bond, and connectivity row to the beginning of each array. Then we'd increment all the connectivity values by 1.