Model trained with Tensorflow 1.7.1 not working

aliBUT commented 6 years ago

Hi , i was checking your classify example, it works perfectly. But i trained a model using Google code labs and i followed these steps. I downloaded tensorflow 1.7.1 same as tfcsharp plugin . testing works perfectly on python level . but when i load retrained_graph.bytes in your code is dosent load this model . Architecture is mobilenet_0.50_224 and image size is 224 .

this is my retrained model and label

https://www.dropbox.com/s/vw34zpkg2a5klrw/retrained.zip?dl=0

i am not sure what i am doing wrong here

thanks

Syn-McJ commented 6 years ago

Hi, so for your model did you use 0.3 version of Unity plugin that I linked in the readme, or 0.4 version?

aliBUT commented 6 years ago

i used 0.4 version of unity plugin and i am using tensorflow-gpu 1.71 for training

Syn-McJ commented 6 years ago

Ok, I'll check your project later today to see what could be the problem.

aliBUT commented 6 years ago

null this is the null exception I am receiving when I try to load my own model. I hope this will help you debug more easily

Syn-McJ commented 6 years ago

I've checked your models and haven't been able to figure out why NRE occurs, unfortunately. If it's critical issue for you then I suggest to use TensorFlow 1.4 for now, otherwise we can wait and see if solution is found, but I'm afraid my TF knowledge is hitting a wall here.

One more thing I would suggest is to check your model with TensorFlowSharp examples and see if it's working there.

aliBUT commented 6 years ago

i tried with tf 1.4 and all the code stopped working and throwing errors. So instead i am going to test my models with tensorflowsharp examples .

Syn-McJ commented 6 years ago

TF 1.4 should work fine, if you provide details to where errors occur I can take a look.

aliBUT commented 6 years ago

I mean errors in python scripts where I was doing my training. Sorry for the confusion.

when I downgraded from 1.7.1 to 1.4 it started giving me this error. Do you think this is because of path variable?

Syn-McJ commented 6 years ago

I see, that's weird. How did you downgrade tensorflow? I usually do pip uninstall tensorflow and then pip install tensorflow==1.4.

In any case, if TesnorflowSharp examples are having same problems with 1.7.1 model, then I think fixing your TF 1.4 installation will be your best bet to make it work in Unity.

aliBUT commented 6 years ago

I did pip install tensoflow==1.4 which caused that error . after doing pip uninstall tensoflow and then pip install tensoflow==1.4 error was no longer there

aliBUT commented 6 years ago

so i did al these steps

1: I downgraded my tensorflow to 1.4 and retrained the model 2: Downgraded the tensorflowsharp plugin from unity to 0.3 in unity project 3: renamed .pb file to bytes 4: Placed the modal and label in resources and gave the references 5: Made the Android build and safe error as above.

this is the model of tf 1.4 here

i think the export of model and the way modal is being used in c# does not match. what python code do you use to export models for image classification?

Syn-McJ commented 6 years ago

Yes, that's might be true. I can see your model is 5smth Mb in size, while my mobilenet models ending up to be around 17Mb.

I use retrain.py script from tensorflow repo: for 1.4: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/image_retraining/retrain.py for 1.7: https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py

aliBUT commented 6 years ago

i am training it for 5 different categories i think that's why size it that low

https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/image_retraining/retrain.py I used this to retain and still the null exception. Anything else I can try?

aliBUT commented 6 years ago

python -m image_retraining.retrain --bottleneck_dir=tf_files/bottlenecks --how_many_training_steps=500 --model_dir=tf_files/models/ --summaries_dir=tf_files/training_summaries/mobilenet_0.50_224 --output_graph=tf_files/retrained_graph.pb --output_labels=tf_files/retrained_labels.txt --architecture=mobilenet_0.50_224 --image_dir=tf_files/DataSet this is my retraining command

Syn-McJ commented 6 years ago

Size doesn't really depend that much on amount of categories, but I think I see why your model is smaller - you use 0.50 mobilenet architecture. Can you try to train full size mobilenet model? --architecture=mobilenet_1.0_224

If it still doesn't work, please send me your trained full size model, I'll check it again.

aliBUT commented 6 years ago

it did not worked . here is the full model

And i also noticed this error inside unity when i run it in editor capture

aliBUT commented 6 years ago

So just to be sure I prepared a new pc and did everything there, Same null exception issue

Syn-McJ commented 6 years ago

Hi @aliBUT, there is one thing I think you didn't do from the readme: change your INPUT_NAME and OUTPUT_NAME. For mobilenet architecture you have to have this values:

private static string INPUT_NAME = "input";
private static string OUTPUT_NAME = "final_result";

This was the cause of your NullReferenceException. After you fix NRE, you might also experience IndexOutOfRangeException, this one is caused by one extra line in the labels file.

I just checked your last model with correct input/output names and removed empty line in labels, and it works fine. After confirming you can also go back and try your previous models, including 1.7.1 one.

Hope it helps.

Syn-McJ commented 6 years ago

So I just rechecked your original 1.7.1 model with correct INPUT_NAME and OUTPUT_NAME and it works fine with 0.4 plugin. Sorry, totally slipped my mind when I first checked it that I need to change those.

I will close this issue as resolved for now, but feel free to reopen if you still have issues.

aliBUT commented 6 years ago

@Syn-McJ it Works !!!! . how do you usually know what is INPUT_NAME and OUTPUT_NAME ? for all the model architectures?

Syn-McJ commented 6 years ago

@aliBUT usually you can infer that from script or example you used. Like for instance, you can see in this guide that layer names are "Placeholder" and "final_result".

aliBUT commented 6 years ago

So Classify is working fine. I decided to try out object detection. Everything works fine as long as I use your model so I trained my modal by following [this ](https://blogs.msdn.microsoft.com/esmsdn/2018/04/09/post-invitado-part2-step-by-step-how-to-training-your-own-detector-classifier/) tutorial

i am facing same issue as my trained modal doesn't work . I made sure input is image tensor and use SS mobilenet v2 lite for training . i am not sure what is the issue here

this is my model and label map

https://www.dropbox.com/s/sgtolycu2t0hx7b/graph.zip?dl=0

Any help will be appreciated

Syn-McJ commented 6 years ago

Hi @aliBUT, could you open separate issue for that? I actually haven't tried object detention with other model but one in tensorflow repo, so I'll need to investigate this thoroughly when have time.

aliBUT commented 6 years ago

Opened #13

Syn-McJ / TFClassify-Unity

Model trained with Tensorflow 1.7.1 not working #11