Syn-McJ / TFClassify-Unity

An example of using Tensorflow with Unity for image classification and object detection.
MIT License
167 stars 47 forks source link

Retrained Image model doesnt work #10

Closed iiidefektiii closed 6 years ago

iiidefektiii commented 6 years ago

I went through the process of creating my own and everything worked in creating it. It output the .pb file but when I run it it doesn't work. The one thing I noticed is that your .bytes file looks like this:

0a36 0a05 696e 7075 7412 0b50 6c61 6365 686f 6c64 6572 2206 2f63 7075 3a30 2a0b 0a05 6474 7970 6512 0230 012a 0b0a 0573 6861 7065 1202 3a00 0ad2 a602 0a09 636f 6e76 3264 305f 7712 0543 6f6e 7374 2206 2f63 7075 3a30 2a0b 0a05 6474 7970 6512 0230 012a a7a6 020a 0576 616c 7565 129c

while the one that was output for me using the retrain.py looks like this all the way through where the sqaures are symbols it just wouldnt show on there: O Placeholder Placeholder dtype0& shape: ÿÿÿÿÿÿÿÿÿ«« å (module/InceptionV3/Conv2d_1a_3x3/weightsConst¤ valuešB— "€Û뾧½ áI»I!ʽù( ½<F=ÞŒm>K÷¥>á

Any ideas as to how to get it into the format that you have? I'm thinking this may be the problem?

iiidefektiii commented 6 years ago

I did do the freeze graph thing and it output the new frozen file that now looks like the one you have in the project but it detects nothing. Any Ideas or point me in the right direction?

Syn-McJ commented 6 years ago

Could you send me your model and labels to check? Also, since your model is Inception you might have different Input and Output names.

Syn-McJ commented 6 years ago

It seems like the problem might be in mismatching versions of TensorFlow you used to train the model and the one that is used in the Unity plugin. You can check this issue for details: https://github.com/Syn-McJ/TFClassify-Unity/issues/6

iiidefektiii commented 6 years ago

I downgraded to 1.4 but now it wont retrain. Is there a different retrain script for 1.4?

iiidefektiii commented 6 years ago

This is what I get when I try and retrain now. If I upgrade to a higher version like 1.7 again it works. Do you have a retrain script from 1.4 that I could try?

EDIT: Also retrained with 1.5 but will not retrain with 1.4 without the below errors I tried retraining and freezing using 1.5 and no luck.

INFO:tensorflow:Looking for images in 'daisy' INFO:tensorflow:Looking for images in 'dandelion' INFO:tensorflow:Looking for images in 'roses' INFO:tensorflow:Looking for images in 'sunflowers' INFO:tensorflow:Looking for images in 'tulips' Traceback (most recent call last): File "retrain.py", line 1333, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "C:\Users\mf11\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "retrain.py", line 1017, in main module_spec = hub.load_module_spec(FLAGS.tfhub_module) File "C:\Users\mf11\Anaconda3\lib\site-packages\tensorflow_hub\native_module.py", line 99, in load_module_spec path = compressed_module_resolver.get_default().get_module_path(path) File "C:\Users\mf11\Anaconda3\lib\site-packages\tensorflow_hub\compressed_module_resolver.py", line 141, in get_default HttpCompressedFileResolver(), File "C:\Users\mf11\Anaconda3\lib\site-packages\tensorflow_hub\compressed_module_resolver.py", line 74, in init self._cache_dir = resolver.tfhub_cache_dir(cache_dir, use_temp=True) File "C:\Users\mf11\Anaconda3\lib\site-packages\tensorflow_hub\resolver.py", line 67, in tfhub_cache_dir os.getenv(_TFHUB_CACHE_DIR, "") or FLAGS["tfhub_cache_dir"].value or TypeError: '_FlagValues' object is not subscriptable

iiidefektiii commented 6 years ago

Ive also figured out my input output names (Placeholder, final_result) through tensorboard and has a size of {"size":299},{"size":299},{"size":3}]}}

I am guessing the 299 goes in the "classifyImageSize" or "detectImageSize" variable on the PhoneCamera.cs depending on which you are using? Because I cannot find private static int INPUT_SIZE anymore.

That leaves me with IMAGE_MEAN & IMAGE_STD what are those?

It also says in your readme that you didn't know if you used 1.4 or 1.5. I cannot get the retrain scripts to work for 1.4 only 1.5 and so far it hasn't worked on the Unity end. Any Ideas? Maybe send me the retrain.py you were using for 1.4?

Thanks!

Syn-McJ commented 6 years ago

Yes, input size goes to classifyImageSize variable. For mean and std you might want to try 128, although label_image.py script uses input_mean = 0 and input_std = 255, so try that as well.

If 1.5 doesn't work in Unity then plugin probably doesn't support it yet, so you're gonna have to use 1.4, but to be sure it's better to check what the problem is exactly with adb logcat.

I see in ternsorflow repo that they changed retrain.py script with 1.7 version, so you should try to retrain using script from 1.4 release branch: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/image_retraining/retrain.py

Syn-McJ commented 6 years ago

Also, I can see here that latest version of the plugin (0.4) seems to use 1.7.1 version of TensorFlow, so you can try to install that version (in my readme I have link to the 0.3). Be aware that there is a migration guide.

iiidefektiii commented 6 years ago

Getting Closer. I used 1.4 and the 1.4 retrain script and it retrained the images and the output looks like yours. There were no checkpoints with this version just output_graph and output_labels. I still can't get it to run in unity tho. I did an ADB and get this over and over till the app crashes now. It may be the values. I am trying to get them but this time tensorboard wont run. just says no active graphs when I know its running. So figuring that out now.

07-31 10:26:08.933 15494 15515 E Unity : [[Node: DecodeJpeg = DecodeJpegacceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false]] 07-31 10:26:08.933 15494 15515 E Unity : at TensorFlow.TFStatus.CheckMaybeRaise (TensorFlow.TFStatus incomingStatus, System.Boolean last) [0x0004a] in <252020d87a4e4581ad2cfe3f9cc7a0ac>:0 07-31 10:26:08.933 15494 15515 E Unity : at TensorFlow.TFSession.Run (TensorFlow.TFOutput[] inputs, TensorFlow.TFTensor[] inputValues, TensorFlow.TFOutput[] outputs, TensorFlow.TFOperation[] targetOpers, TensorFlow.TFBuffer runMetadata, TensorFlow.TFBuffer runOptions, TensorFlow.TFStatus status) [0x00144] in <252020d87a4e4581ad2cfe3f9cc7a0ac>:0 07-31 10:26:08.933 15494 15515 E Unity : at TensorFlow.TFSession+Runner.Run (TensorFlow.TFStatus status) [0x00033] in <252020d87a4e4581ad2cfe3f9cc7a0ac>:0 07-31 10:26:08.933 15494 15515 E Unity : at TFClassify.Classifier+cAnonStorey0.<>m0 () [0x00084] in <4bbe071d8c97431dad031894333c811e>:0

Syn-McJ commented 6 years ago

DecodeJpeg operation seem suspicions to me, it isn't supported on mobile and that's why I have TransformInput method for transforming the image. Maybe tomorrow I'll check that example you sent me and see if there are any problems possibly arise from it.

iiidefektiii commented 6 years ago

I got rid of that error. Not sure what it was but rebuilt and now I get a null reference to object exception. I need to find out what all the input mean and input std, size and input output names but the 1.4 version .pb file tells me there is no event data for the graph which is weird. I am going to upgrade and migrate to 1.7 and see if it works. I'm running out of ideas.

I can send you a zip of the project if you want. that's all I've done is retrain the flower images from the example and put them in the resources folder at this point.

Syn-McJ commented 6 years ago

Sure, send it. I'll try to check tomorrow, but no promises. Definitely will check before next week.

iiidefektiii commented 6 years ago

If not no worries. Pulling my hair out trying to figure out why it won't run. haha

iiidefektiii commented 6 years ago

My Project: http://deadicatedgames.com/projects/TFImageDetection/TF-Image-Detection.zip

Syn-McJ commented 6 years ago

Hi @iiidefektiii,

I checked your model in my project and I still see DecodeJpeg error. This is definitely a problem, DecodeJpeg operation simply isn't supported on mobile.

I think the issue might be Inception model architecture which uses that operation. I only tried my example with Mobilenet architecture and it would makes sense if it doesn't have DecodeJpeg operation since it has been created specifically to work on mobile platforms.

So please try to train a Mobilenet model and check it with my example again. To be sure use 1.4 version of TensorFlow and the script.

You can train Mobilenet model by specifying architecture with a flag --architecture mobilenet_1.0_224, for example: python retrain.py --image_dir ~/flower_photos --architecture mobilenet_1.0_224

Let me know how it goes.

Syn-McJ commented 6 years ago

Hi @iiidefektiii, I'm gonna clos this issue for now, feel free to reopen if you tried mobilenet model and still have problems.

iiidefektiii commented 6 years ago

Yeah I tried mobilenet and still didn't work. I will at some point delete the project and start over. I know I was on tensorflow 1.4 and got the import export and size from tensorboard but don't know how to get image mean and std. I think I also updated the tensor flow sharp at one point to 1.7 when you said it was supported.

After I try all that I'll check back in. Thanks for the help.

On Sun, Aug 12, 2018, 10:27 AM Andrey Ashikhmin notifications@github.com wrote:

Hi @iiidefektiii https://github.com/iiidefektiii, I'm gonna clos this issue for now, feel free to reopen if you tried mobilenet model and still have problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Syn-McJ/TFClassify-Unity/issues/10#issuecomment-412346583, or mute the thread https://github.com/notifications/unsubscribe-auth/AY9qkosslgvtnkgm9bEMeGnmcDNX98opks5uQDtkgaJpZM4VkaBg .

iiidefektiii commented 6 years ago

I deleted the project and started from scratch. Verified TensorFlow 1.4 was installed. Verified that TensorFlowSharp 0.3 was in the project Retained using mobilenet_1.0_224 architecture Verified that the classify image size was 224 in TensorBoard Verified that input/output was input/final_result in Tensorboard Changed the .pb to bytes and dropped it and the labels file into the project

Everything works now. Not sure where it was getting stuck but knowing that you used mobilenet_1.0_224 architecture was probably a HUGE help.

Thanks for the help.

Syn-McJ commented 6 years ago

Hi @iiidefektiii , that's awesome, glad you fixed it. I should probably check inception model again to confirm that it won't work and update the readme.