problem with freezing graph #107

ohwe opened 6 years ago

ohwe commented 6 years ago

Hi, ELMo team. For TF models deployment in c++ I typically use freezing graphs (via graph_util.convert_variables_to_constants) to const GraphDef and then I have single .pb graphdef-file for applying model.

However, in this case a set of 'bilm/Variable_*' (being assigned through tf.assign) variables prevents me from freezing the model. These variables arise as init_states for LSTMs being passed between layers.

The question is why do you avoid using tf.nn.rnn_cell.MultiRNNCell in this case that seems to make things much simpler?

matt-peters commented 6 years ago

I haven't tried to use graph_util.convert_variables_to_constants so can't comment on the source of the error.

As far as the particular implementation details, I honestly can't remember, I wrote much of this code nearly two years ago :-/

limohanlmh commented 6 years ago

Hi, I have also encountered the similar problem freezing the graph for C++ deployment. To be specific, the elmo feature is the output node of the computation graph, after freezing the graph, the token id is the entry node of the graph. However, as I called the tensorflow C++ API to create a session, an error triggered, indicating that "Invalid argument: Input 0 of node bilm/Assign_7 was passed float from bilm/Variable_7:0 incompatible with expected float_ref."

Rusiecki commented 5 years ago

Hello, I'm running into a similar problem. When running python3.7 --input_meta_graph='/Users/banana/Downloads/2/model.ckpt-113750.meta' --input_checkpoint='/Users/banana/Downloads/2/model.ckpt-113750' --output_graph='/Users/banana/Downloads/2/frozengraph.pb' --output_node_names='bilm/Assign_7' --input_binary=True

I get an assertion error for any value inside of output_node_names =

Error message :

Loaded meta graph file '/Users/master/Downloads/2/model.ckpt-113750.meta 2019-03-30 13:31:57.446975: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Traceback (most recent call last): File "", line 495, in <module> run_main() File "", line 492, in run_main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/", line 125, in run _sys.exit(main(argv)) File "", line 491, in <lambda> my_main = lambda unused_args: main(unused_args, flags) File "", line 385, in main flags.saved_model_tags, checkpoint_version) File "", line 367, in freeze_graph checkpoint_version=checkpoint_version) File "", line 229, in freeze_graph_with_def_protos variable_names_blacklist=variable_names_blacklist) File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/", line 232, in convert_variables_to_constants inference_graph = extract_sub_graph(input_graph_def, output_node_names) File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/", line 174, in extract_sub_graph _assert_nodes_are_present(name_to_node, dest_nodes) File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/", line 133, in _assert_nodes_are_present assert d in name_to_node, "%s is not in graph" % d AssertionError: bilm/Assign_7 is not in graph

The code i run is :

The files I have are :

-rwxrwxrwx@ 1 master staff 91 Mar 30 03:39 checkpoints -rwxrwxrwx@ 1 master staff 34533780 Mar 27 20:53 events.out.tfevents.1553461572.elmo- -rwxrwxrwx 1 master staff 1569619104 Mar 27 21:16 -rwxrwxrwx 1 master staff 1569619104 Mar 27 21:16 -rwxrwxrwx 1 master staff 3275 Mar 27 21:16 model.ckpt-113750.index -rwxrwxrwx@ 1 master staff 7509658 Mar 27 20:21 model.ckpt-113750.meta

Is there any update how to get a pb file out the elmo training ?

carolmanderson commented 4 years ago

@Rusiecki did you ever find a solution for this?

mohammedayub44 commented 4 years ago

I wanted to create a SavedModel format from the trained model to be used as REST Endpoints using Tensorflow Serving. Is there an easy way to do so.


carolmanderson commented 4 years ago

@mohammedayub44 Assuming what you want to do is use the trained model to generate embeddings: I figured out a way to do it by removing the ops that pass the state forward between batches. My model is now running on TF Serving in production. There doesn't seem to be an easy way to deploy the model in its stateful form. If you wanted to maintain state between batches, you would need to modify the graph to take the previous batch's states as input and produce the current states as output, and then explicitly pass them forward with each call to your endpoint.

In my case, I wanted to turn off statefulness anyway, because it causes non-deterministic behavior and makes testing of the deployed model difficult. Depending on the application, I found that turning off statefulness caused a 0-0.5% decrease in the F1 score of the model consuming the ELMo embeddings.

To turn off statefulness when computing embeddings, these lines in should be removed or commented out:

This is the code I used to export the graph. I did it in two steps -- I found that after the first step, my model didn't have the TF serving tags, and the second step was necessary to add the tags. There's probably a more direct way of doing this.

Step 1:

import tensorflow as tf

from bilm import BidirectionalLanguageModel  #Note: make sure to comment out the lines referenced above in your copy of bilm-tf before this import

elmo_weight_file = '/path/to/my_ckpt_weights.hd5'
elmo_options_file = '/path/to/my_options.json'
output_file = '/path/to/my_saved_model.pb'

model = BidirectionalLanguageModel(elmo_options_file, elmo_weight_file)

graph = tf.Graph()

with graph.as_default():
    ids_placeholder = tf.placeholder('int32', shape=(None, None, 50))
    ops = model(ids_placeholder)
    session = tf.Session()


input_graph_def = session.graph.as_graph_def()  
output_graph_def = tf.graph_util.convert_variables_to_constants(
            output_node_names.split(",") ) 

with tf.gfile.GFile(output_file, "wb") as f:

Step 2:

import tensorflow as tf
from tensorflow.saved_model import simple_save

def load_graph(model_file, returnElements= None):
    graph = tf.Graph()
    graph_def = tf.GraphDef()
    with open(model_file, "rb") as f:
    returns = None
    with graph.as_default():
        returns = tf.import_graph_def(graph_def, return_elements= returnElements)
    if returnElements is None:
        return graph
    return graph, returns

old_graph = "/path/to/my_saved_model.pb"
new_graph = "/path/to/my_new_saved_model.pb"

graph = load_graph(old_graph)
with tf.Session(graph = graph) as sess:
    with graph.as_default():
        layers = [ for n in graph.as_graph_def().node]
        output_node_name = layers.pop() + ":0"
        input_node_name = layers.pop(0) + ":0"
    output_node = tf.get_default_graph().get_tensor_by_name(output_node_name)
    input_node = tf.get_default_graph().get_tensor_by_name(input_node_name)

    inputs = { : input_node}
    outputs = { : output_node}
    simple_save(sess, new_graph, inputs, outputs)

And here's a code snippet to check whether your export worked. If the ops that pass the state forward haven't been removed, this will raise an error like ValueError: Input 0 of node bilm/Assign was passed float from bilm/Variable:0 incompatible with expected float_ref.

frozen_graph = "/path/to/my_new_saved_model.pb"
with tf.gfile.GFile(frozen_graph, "rb") as f:
    restored_graph_def = tf.GraphDef()
mohammedayub44 commented 4 years ago

@carolmanderson Thanks for the detailed answer. I'm trying to connect to above link , it doesn't seem to work. If that's a direct clone of this repo. Are you referring to these lines in

with tf.control_dependencies([layer_output]):
     # update the initial states
     for i in range(2):
          new_state = tf.concat(
          [final_state[i][:batch_size, :],
          init_states[i][batch_size:, :]], axis=0)
          state_update_op = tf.assign(init_states[i], new_state)

Thanks !

carolmanderson commented 4 years ago

Yes, sorry, I was signed into two different Github accounts at once and got confused. I've updated it above.

mohammedayub44 commented 4 years ago

No Problem. :) Couple of thoughts - 1) I got the ValueError... as you said. Therefore commented out the state fullness part (Had to comment out line 394 as well) and then ran Step 1. Ran fine. Checked the frozen graph export from Step1 with the code. Worked fine.

2) In Step 2 - simple_save() function doesn't output anything in the variables folder. Guessing all variable data in the frozen graph that's generated and hence variable file is not required. Checking the graph export from Step2, gives me error (tried in both TF1 and TF2) image

However in TF2 I tried your sugesstion from #238 using tf.saved_model.load() and it works fine 👍


Cheers !