Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
https://pocketflow.github.io
Other
2.78k stars 490 forks source link

Accuracy drooped after using export utility #192

Open besherh opened 5 years ago

besherh commented 5 years ago

I am trying to calculate manually the accuracy of a model with uniform-tf learner. After calling export_quant_tflite_model, a Pb file was generated, python ./tools/conversion/export_quant_tflite_model.py --model_dir ./models_uqtf_eval

I am trying to test the accuracy of the original_model.pb.


import tensorflow as tf
import numpy as np
import keras
from keras.datasets import cifar10

def load_graph(frozen_graph_filename):
    # We load the protobuf file from the disk and parse it to retrieve the 
    # unserialized graph_def
    with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    # Then, we import the graph_def into a new Graph and returns it 
    with tf.Graph().as_default() as graph:
        # The name var will prefix every op/nodes in your graph
        # Since we load everything in a new graph, this is not needed
        tf.import_graph_def(graph_def, name="prefix")
    return graph

graph = load_graph("./saved_model.pb")

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

y_test = keras.utils.to_categorical(y_test, num_classes)

labels=np.argmax(y_test, axis=1)
with tf.Session(graph=graph) as sess:
      #sess.run(tf.global_variables_initializer())
      x = graph.get_tensor_by_name('prefix/net_input:0')
      y = graph.get_tensor_by_name('prefix/net_output:0')
      feed_dict={x: x_test}
      opt = []        
      opt = sess.run(y,feed_dict)
      predictions=np.argmax(opt,axis=1)
      count =  np.sum(labels == predictions)
      print("Number of correct prediction %d out of %d"%(count,len(predictions)))
      print("Accuracy is {:.3f}". format(float(count)/len(predictions)))

the result is : Number of correct prediction 1063 out of 10000 Accuracy is 0.106

The accuracy is too low ! so I decided to run it again with a different model from a differen learner, I tried with channel pruning "trasformed_model.pb" but same issue (accuracy is too low ). why the accuracy has dropped after freezing the graph? are there any mistakes in my approach?

Another question 👍 After calling : python ./tools/conversion/export_quant_tflite_model.py --model_dir ./models_uqtf_eval

why there is no transformed_model.pb ? only model_quantzed.tflite is generated ?

jiaxiang-wu commented 5 years ago
  1. Can you provide the *.ckpt files and necessary code, so that we can re-produce your issue?
  2. ./tools/conversion/export_quant_tflite_model.py only generates one *.pb file, named "model_original.pb". It is as expected.
besherh commented 5 years ago

Please download the model files from the following link: https://drive.google.com/file/d/1Nya13flNGOUiXgPhkH2yToIJTOC_D4y-/view?usp=sharing

Here are the steps to generate the files in the link above: 1- run this command: ./scripts/run_local.sh ./nets/resnet_at_cifar10_run.py --learner uniform-tf

2- use export utility python ./tools/conversion/export_quant_tflite_model.py --model_dir ./models_uqtf_eval

Finally, run the code that I posted earlier to test the accuracy of model_original.pb (you could find it in the downloaded files [models_uqtf_eval]) : The number of correct predictions is 2452 out of 10000

jiaxiang-wu commented 5 years ago

Thanks for detailed explanation. We will keep you informed if any potential bug is spotted.

besherh commented 5 years ago

Apart from that, could you verify the following:

1- models_uqtf_eval folder contains the *.ckpt files for the quantized model after testing. right? I need to test the accuracy for the quantized model using the ckpt files. However, I am not able to because when I am trying to load the graph from those files, I can not find the input/output tensors. I need to know the exact names for them to be able to run the session with feed_dic.

I tried to list all tensors like the following:

from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
import os
model_dir = "./"

checkpoint_path = os.path.join(model_dir, "model.ckpt")

print_tensors_in_checkpoint_file(file_name=checkpoint_path,all_tensors=True, tensor_name='')

The result could be optined from downloading this file and you could notice that there is no input/output tensors: https://drive.google.com/file/d/1CiH_DP3tLXGrhJ4Yy0OR4hFNBWlBg646/view?usp=sharing

jiaxiang-wu commented 5 years ago

@besherh For your last comment, can you find any tensors in these two collections, defined here? https://github.com/Tencent/PocketFlow/blob/master/learners/uniform_quantization_tf/learner.py#L276

besherh commented 5 years ago

no, I could not find any of them in the tensors name list that I had from the *.ckpt files

yuanyuanli85 commented 5 years ago

I am facing the similar problem. The validation score close to zero if I using the frozen graph which converted from the ckpt. It is very tricky. Everything works well if I using the checkpoint's meta graph. I guess something wrong when freezing the graph or inference. so far, total lost there.

besherh commented 5 years ago

@yuanyuanli85 Even with checkpoint's metafiles (ckpt), accuracy is very low for me !

jiaxiang-wu commented 5 years ago

Maybe some recent updates (or even earlier) lead to this. We are looking into this now. @besherh @yuanyuanli85

yuanyuanli85 commented 5 years ago

@jiaxiang-wu thank you for quick response. For my issue, the root cause come from my code instead of pocketflow. I did not pass correct dataset iterator after loading the frozen graph. Sorry for this false alarm. Pocketflow is a wonderful tool!

jiaxiang-wu commented 5 years ago

@yuanyuanli85 Thanks for your response. We are planning to publish a benchmark tool for .pb and .tflite models, to test their classification accuracy and verify whether the model conversion is working as expected. This may be helpful for other users as well.

yuanyuanli85 commented 5 years ago

@jiaxiang-wu 💯 👍

besherh commented 5 years ago

@yuanyuanli85 Could you please share the code that solves your issue ? Thanks

yuanyuanli85 commented 5 years ago

@besherh The issue i met is not the same as yours. Please pay attention to the data preprocess pipeline. Please check the data preprocess in your code, and make sure it is matched with the parse_fn defined in cifar10_dataset.py, such as transpose dims, subtraction with mean, divide by IMAGE_STD ... etc. Generally, make sure the data feeding into the graph is the same as what is feed to train the network.

def parse_fn(example_serialized, is_train):
  # data parsing
  record = tf.decode_raw(example_serialized, tf.uint8)
  label = tf.slice(record, [0], [LABEL_BYTES])
  label = tf.one_hot(tf.reshape(label, []), FLAGS.nb_classes)
  image = tf.slice(record, [LABEL_BYTES], [IMAGE_BYTES])
  image = tf.reshape(image, [IMAGE_CHN, IMAGE_HEI, IMAGE_WID])
  image = tf.cast(tf.transpose(image, [1, 2, 0]), tf.float32)
  image = (image - IMAGE_AVE) / IMAGE_STD

  # data augmentation
  if is_train:
    image = tf.image.resize_image_with_crop_or_pad(image, IMAGE_HEI + 8, IMAGE_WID + 8)
    image = tf.random_crop(image, [IMAGE_HEI, IMAGE_WID, IMAGE_CHN])
    image = tf.image.random_flip_left_right(image)

  return image, label
besherh commented 5 years ago

@yuanyuanli85 Thanks for the hint. I will try to work on that

besherh commented 5 years ago

Maybe some recent updates (or even earlier) lead to this. We are looking into this now. @besherh @yuanyuanli85

@jiaxiang-wu could you please update us about this ?