allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

Exporting finetuned model to SavedModel format for Tensorflow Serving #238

Closed mohammedayub44 closed 3 years ago

mohammedayub44 commented 3 years ago

Hi,

Thanks for the excellent work on repo. I was able to train and finetune a custom model using this. Also able to test the model with checkpoints successfully. However, my need is to use the model in Tensorflow Serving, hence the requirement of SavedModel to avoid any issues.

So far, I managed to test using Checkpoint using below code:

datadir = os.path.join('/home/ubuntu/mayub/lm_training/elmo', 'finetuned_model')
vocab_file = os.path.join(datadir, 'vocab-2016-09-10.txt')
options_file = os.path.join(datadir, 'options.json')
weight_file = os.path.join(datadir, 'finetune_model_weights.hdf5')

# Create a Batcher to map text to character ids.
batcher2 = Batcher(vocab_file, 50)

# Input placeholders to the biLM.
context_character_ids2 = tf.placeholder('int32', shape=(None, None, 50))

# Build the biLM graph.
bilm2 = BidirectionalLanguageModel(options_file, weight_file)

# Get ops to compute the LM embeddings.
context_embeddings_op2 = bilm2(context_character_ids2)

# Get an op to compute ELMo (weighted average of the internal biLM layers)
elmo_context_input2 = weight_layers('input', context_embeddings_op2, l2_coef=0.0)

## Run the Inference with TF Session
with tf.Session() as sess:
    # It is necessary to initialize variables once before running inference.
    sess.run(tf.global_variables_initializer())

    # Create batches of data. `tokenized_context` has [list [list(text tokens) ]]
    context_ids = batcher2.batch_sentences(tokenized_context)
    print("Shape of context ids = ", context_ids.shape)

    # Compute ELMo representations (here for the input only, for simplicity).
    elmo_context_input_ = sess.run(
        elmo_context_input2['weighted_op'],
        feed_dict={context_character_ids2: context_ids}
    )

print("Shape of generated embeddings = ",elmo_context_input_.shape)
# Output:
# Shape of context ids =  (3, 14, 50)
# Shape of generated embeddings =  (3, 12, 1024

Basic outline on how the load and save a builder object, using below code:

tf.reset_default_graph()
saver = tf.train.import_meta_graph('/path/to/checkpoint/meta_file.meta')
builder = tf.saved_model.builder.SavedModelBuilder('/path/to/output/dir/')
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,  log_device_placement=True)) as sess:
        # Restore variables from disk.
        saver.restore(sess, '/path/to/finetuned_model/dir/')
        print("Model restored.")
        builder.add_meta_graph_and_variables(sess, ['custom_tag'],strip_default_attrs=False)
builder.save()

I'm slightly confused on how my Signature Def should look like and how to account for any other pre-processing operations and layers etc.

I want it to be something similar to what Tf Hub-ELMO 3 has or atleast support the following:

Any help appreciated. Thanks in advance!

@matt-peters

carolmanderson commented 3 years ago

Issue #193 contains an explanation of the outputs you can get from this implementation. Unlike the TF Hub implementation, the bilm-tf implementation can't directly give you a weighted sum of the three output layers. You can, however, weight the three output layers yourself, for example by including a keras WeightedAverage layer in a model that's consuming the ELMo embeddings. Note that the first of the three output layers from lm_embeddings contains the character-based representations you wanted.

Here's a code snippet for getting the three output layers. Also note that per my comment on issue #107, this code requires the model saved in Step 1, not the final model with TF serving tags. When you're ready to deploy the model in TF serving, use the model saved in Step 2.

import tensorflow as tf
from bilm import Batcher, BidirectionalLanguageModel

# load the saved model
frozen_graph = '/path/to/my_saved_model.pb'
with tf.gfile.GFile(frozen_graph, "rb") as f:
    restored_graph_def = tf.GraphDef()
    restored_graph_def.ParseFromString(f.read())

with tf.Graph().as_default() as graph:
    tf.import_graph_def(
        restored_graph_def,
        input_map=None,
        return_elements=None,
        name="")

output_node = graph.get_tensor_by_name("concat_3:0")
input_node = graph.get_tensor_by_name("Placeholder:0")

# generate character ids for your input documents
vocab_file = '/path/to/my_vocab.txt'
batcher = Batcher(vocab_file, 50)
char_ids = batcher.batch_sentences([["Hello", "world"]])

# get embeddings
sess = tf.Session(graph=graph)
my_feed_dict = {input_node: char_ids}
embs = sess.run(output_node, feed_dict=my_feed_dict)

Also, one more note: this model produces very large outputs. When you deploy the model in TF serving, the embeddings have to be serialized to be returned to you. If you're then feeding them to another model, they will have to be de-serialized. The serialization/deserialization steps are time-consuming, and it would be faster to deploy the models in native tensorflow, rather than via TF serving, so that the embeddings can be passed directly to the downstream model as numpy arrays, skipping the serialization/deserialization steps.

mohammedayub44 commented 3 years ago

@carolmanderson Great. Thanks for in the detailed code snippet.

I was using Streamlit to build my prototype app. All my other word embedding models are using Tensorflow 2 and are natively loaded from checkpoints. Since this repo doesn't support TF2.0. I had to go down this route of including them as REST endpoints.

Good point about output size. I'm passing independent sentences. Does that depend on batch size or no of sentences. My guess is simple python pickling should work ?

carolmanderson commented 3 years ago

@mohammedayub44 ah, ok. In that case, you can export the model as described in #107 and reload it in Tensorflow 2 within your Streamlit app. Here's sample code (caveat: I haven't run this in a Streamlit app. But I have confirmed it works in Tensorflow 2):

import tensorflow as tf

from bilm import Batcher

# reload the model
loaded = tf.saved_model.load("/path/to/saved/model") # this is a directory. Don't include the file itself in the path. 
infer = loaded.signatures["serving_default"]

# get the char ids for your documents
vocab_file = '/path/to/my_vocab.txt'
batcher = Batcher(vocab_file, 50)
char_ids = batcher.batch_sentences([["Hello", "world"]])
char_ids = char_ids.astype('int32')    # must be cast to int32 before feeding to model

# get embeddings
embs = infer(tf.constant(char_ids))['import/concat_3:0']

Don't be alarmed if you see this message: INFO:tensorflow:Saver not created because there are no variables in the graph to restore. This is expected.

Regarding the output size, you'll get a 3 x 1024 tensor for every token in your input. So long documents or large batches can both cause large outputs.

mohammedayub44 commented 3 years ago

@carolmanderson Thanks. Could you verify the lines to be commented in #107 . The link did not work unfortunately.

carolmanderson commented 3 years ago

Sorry about that. The lines are:

https://github.com/allenai/bilm-tf/blob/7cffee2b0986be51f5e2a747244836e1047657f4/bilm/model.py#L587-L593

mohammedayub44 commented 3 years ago

No problem. It works smoothly in Tensorflow 2. Guess I will skip the serving part for now as loading natively works better for me using Streamlit.