dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

System.InvalidOperationException when loading tensorflow model #3689

Closed carlosefrias closed 5 years ago

carlosefrias commented 5 years ago

System information

Issue

System.InvalidOperationException when calling LoadTensorFlowModel function

Source code / logs

at Microsoft.ML.Transforms.TensorFlow.TensorFlowUtils.LoadTFSession(IExceptionContext ectx, Byte[] modelBytes, String modelFile) at Microsoft.ML.TensorflowCatalog.LoadTensorFlowModel(ModelOperationsCatalog catalog, String modelLocation) at ImageClassification.Score.ModelScorer.TFModelScorer.LoadModel(String dataLocation, String imagesFolder, String modelLocation) in C:\Users\me1cme\repos\ml.net-learning\samples\csharp\getting-started\DeepLearning_ImageClassification_TensorFlow\ImageClassification\ModelScorer\TFModelScorer.cs:line 67 at ImageClassification.Score.ModelScorer.TFModelScorer.Score() in C:\Users\me1cme\repos\ml.net-learning\samples\csharp\getting-started\DeepLearning_ImageClassification_TensorFlow\ImageClassification\ModelScorer\TFModelScorer.cs:line 50 at ImageClassification.Program.Main() in C:\Users\me1cme\repos\ml.net-learning\samples\csharp\getting-started\DeepLearning_ImageClassification_TensorFlow\ImageClassification\Program.cs:line 27

Message TensorFlow exception triggered while loading model from '../../../assets/inputs/final.pb'

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

abgoswam commented 5 years ago

Hi @carlosefrias , could you kindly point us to the frozen model, so we can repro this on our end. Also if you could provide the sample data / code that you are using ?

abgoswam commented 5 years ago

@carlosefrias . am closing this since there was no response. please re-open (with sample data/code/model) if u r still facing this issue.

baruchiro commented 5 years ago

Hi, I'm using the sample here with this code to create the .pb file:

import tensorflow as tf

f_size = 15 # Number of features passed from ML.Net
num_output = 2 # Number of outputs
tf.set_random_seed(1)
X = tf.placeholder('float', [None, f_size], name="X")
Y = tf.placeholder('float', [None, num_output], name="Y")
lr = tf.placeholder(tf.float32, name = "learning_rate")

# Set model weights
W = tf.Variable(tf.random_normal([f_size,num_output], stddev=0.1), name = 'W')
b = tf.Variable(tf.zeros([num_output]), name = 'b')

l1 = 0
l2 = 0
RegScores = tf.add(tf.matmul(X, W), b, name='RegScores')
loss = tf.reduce_mean(tf.square(Y-tf.squeeze(RegScores))) / 2  + l2 * tf.nn.l2_loss(W) + l1 * tf.reduce_sum(tf.abs(W))
loss = tf.identity(loss, name="Loss")
optimizer = tf.train.MomentumOptimizer(lr, momentum=0.9, name='MomentumOptimizer').minimize(loss)

init = tf.global_variables_initializer()
# Launch the graph.
with tf.Session() as sess:
    sess.run(init)
    tf.saved_model.simple_save(sess, r'NYCTaxi/model', inputs={'X': X, 'Y': Y}, outputs={'RegScores': RegScores} )

And I get the error:

System.InvalidOperationException : TensorFlow exception triggered while loading model from 'Resources/saved_model.pb'

I think the issue is about providing more information when the TF failed, and not about the problem itself.

TannerGilbert commented 5 years ago

I have the same issue. My code works when I'm using a pretrained mobilenet but fails when I try to run it with my own model. {"TensorFlow exception triggered while loading model from 'xyz\\bin\\Debug\\netcoreapp2.1\\../../../assets\\inputs\\model\\model.pb'"}

For training the custom model I'm using

from keras.applications.mobilenet import MobileNet
from keras.preprocessing import image
from keras.models import Model, load_model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
import tensorflow as tf
import numpy as np
import pandas as pd
from PIL import Image
import argparse

def data_gen(df, num_classes, batch_size=32, input_shape=(224, 224, 3)):
    """ Load in image data"""
    while True:
        idx = np.random.choice(a=np.arange(len(df['ImgPath'])), size=batch_size)
        batch_paths = df['ImgPath'][idx]
        images = []
        for img_path in batch_paths:
            image = Image.open(str(img_path))
            image = image.resize(input_shape[0:2], Image.ANTIALIAS)
            if input_shape[2] == 1:
                image = image.convert('LA')
            image = np.asarray(image)
            images.append(image)
        images = np.array(images)
        images = images.reshape(len(images), input_shape[0], input_shape[1], input_shape[2])
        labels = np.array(df['VG'][idx])

        labels = to_categorical(labels, num_classes=num_classes)
        yield (images, labels)

def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
    """
        Freezes the state of a session into a pruned computation graph.

        Creates a new computation graph where variable nodes are replaced by
        constants taking their current value in the session. The new graph will be
        pruned so subgraphs that are not necessary to compute the requested
        outputs are removed.
        @param session The TensorFlow session to be frozen.
        @param keep_var_names A list of variable names that should not be frozen,
                            or None to freeze all the variables in the graph.
        @param output_names Names of the relevant graph outputs.
        @param clear_devices Remove the device directives from the graph for better portability.
        @return The frozen graph definition.
    """
    graph = session.graph
    with graph.as_default():
        freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
        output_names = output_names or []
        output_names += [v.op.name for v in tf.global_variables()]
        input_graph_def = graph.as_graph_def()
        if clear_devices:
            for node in input_graph_def.node:
                node.device = ''
        frozen_graph = tf.compat.v1.graph_util.convert_variables_to_constants(
            session, input_graph_def, output_names, freeze_var_names)
        return frozen_graph

def create_model(num_classes, compile=True):
    base_model = MobileNet(weights='imagenet', include_top=False)

    x = base_model.output
    x = GlobalAveragePooling2D()(x)

    x = Dense(1024, activation='relu')(x)

    predictions = Dense(num_classes, activation='softmax')(x)

    model = Model(base_model.input, predictions)

    if compile:
        model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    print(model.summary())

    return model

def get_model(filepath, num_classes):
    try:
        model = load_model(filepath)
        if len(model.predict(np.zeros((1, 224, 224, 3)))[0]) != num_classes:
            print('Replacing output layer')
            output = Dense(num_classes, activation='softmax', name='dense_2')(model.layers[-2].output)
            model = Model(model.input, output)
        model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
        print(model.summary())
        return model
    except Exception as e:
        print(e)
        print('Wrong model path. Creating new model.')
        model = create_model(num_classes)
        return model

def train_model(model, filepath, epochs, batch_size, num_classes, saving_directory, data_quality):
    #Prepare data
    df = pd.read_csv(filepath)
    df.dropna(inplace=True)
    df = df[(df['Q']>data_quality)]
    df.reset_index(drop=True, inplace=True)
    df['VG'] = df['VG'] - 1
    df = df[:50]

    # Training
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        checkpoint = ModelCheckpoint(saving_directory + 'model-{epoch:04d}.h5', monitor='loss', verbose=1, save_best_only=True)
        data = pd.read_csv
        model.fit_generator(data_gen(df, num_classes, batch_size=batch_size, input_shape=(224, 224, 3)), epochs=epochs, steps_per_epoch=(len(df)/batch_size), callbacks=[checkpoint])

        print([out.op.name for out in model.outputs])

        frozen_graph = freeze_session(tf.keras.backend.get_session(), output_names=[out.op.name for out in model.outputs])

        tf.train.write_graph(frozen_graph, "./", "model.pb", as_text=False)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Train Vehicle Classification Network')
    parser.add_argument('-f', '--filename', type=str, required=True, help='Path to data csv file')
    parser.add_argument('-m', '--model_path', type=str, default='', help='Path to model file (h5)')
    parser.add_argument('-e', '--epochs', type=int, default=10, help='Number of epochs')
    parser.add_argument('-b', '--batch_size', type=int, default=32, help='Batch Size')
    parser.add_argument('-sd', '--saving_directory', type=str, default='models/', help='Model saving directory')
    parser.add_argument('-nc', '--num_classes', type=int, default=7, help='Number of classes')
    parser.add_argument('-q', '--data_quality', type=int, default=10, help='Min Q value')
    args = parser.parse_args()
    if args.model_path:
        model = get_model(args.model_path, args.num_classes)
    else:
        model = create_model(args.num_classes)
    train_model(model, args.filename, args.epochs, args.batch_size, args.num_classes, args.saving_directory, args.data_quality)