CSBDeep / CSBDeep_fiji

BSD 2-Clause "Simplified" License
11 stars 4 forks source link

"Run your network" crashes #28

Closed guijacquemet closed 4 years ago

guijacquemet commented 5 years ago

I generated my own network using the google Collab version of Noise2Void and I tried to run it in the imageJ plugin of CSBDeep. Unfortunately it crashes. The demo networks (ie microtubule) provided with CSBDeep work perfectly.

I did not setup the GPU support for tensorflow yet.

I am using Fiji with a fresh install of the CSBDeep plugin (ImageJ 1.52n). Java 1.8.0.172 (64 bit). I have this issue in Ubuntu 18.04 but also in Windows. The network and the error message can be found bellow.

Any help would be much appreciated !

"[INFO] imagej-tensorflow version: 1.0.1 [INFO] tensorflow version: 1.12.0 [INFO] The current library path is: LD_LIBRARY_PATH=/home/guillaume/Fiji3/Fiji.app/lib/linux64:/home/guillaume/Fiji3/Fiji.app/mm/linux64 [INFO] Couldn't load tensorflow GPU support. [INFO] If the problem is CUDA related, make sure CUDA and cuDNN are in the LD_LIBRARY_PATH. [INFO] Using CPU version. [INFO] Loading TensorFlow model GenericNetwork_00e6491331613fab9f1b57b0d9ddc109 from source file file:/home/guillaume/Desktop/TF_SavedModel.zip [INFO] TensorFlow model cache: /home/guillaume/Fiji3/Fiji.app/models org.tensorflow.TensorFlowException: Op type not registered 'FusedBatchNormV3' in binary running on guillaume-HP-EliteDesk-800-G1-TWR. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. at org.tensorflow.SavedModelBundle.load(Native Method) at org.tensorflow.SavedModelBundle.access$000(SavedModelBundle.java:27) at org.tensorflow.SavedModelBundle$Loader.load(SavedModelBundle.java:32) at org.tensorflow.SavedModelBundle.load(SavedModelBundle.java:95) at net.imagej.tensorflow.DefaultTensorFlowService.loadModel(DefaultTensorFlowService.java:114) at de.csbdresden.csbdeep.network.model.tensorflow.TensorFlowNetwork.loadModel(TensorFlowNetwork.java:141) at de.csbdresden.csbdeep.network.model.DefaultNetwork.loadModel(DefaultNetwork.java:52) at de.csbdresden.csbdeep.network.DefaultModelLoader.loadNetwork(DefaultModelLoader.java:41) at de.csbdresden.csbdeep.network.DefaultModelLoader.run(DefaultModelLoader.java:20) at de.csbdresden.csbdeep.commands.GenericNetwork.tryToPrepareInputAndNetwork(GenericNetwork.java:494) at de.csbdresden.csbdeep.commands.GenericNetwork.initiateModelIfNeeded(GenericNetwork.java:291) at de.csbdresden.csbdeep.commands.GenericNetwork.mainThread(GenericNetwork.java:426) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.NullPointerException at de.csbdresden.csbdeep.network.DefaultInputValidator.checkForTooManyDimensions(DefaultInputValidator.java:32) at de.csbdresden.csbdeep.network.DefaultInputValidator.run(DefaultInputValidator.java:18) at de.csbdresden.csbdeep.commands.GenericNetwork.tryToPrepareInputAndNetwork(GenericNetwork.java:497) at de.csbdresden.csbdeep.commands.GenericNetwork.initiateModelIfNeeded(GenericNetwork.java:291) at de.csbdresden.csbdeep.commands.GenericNetwork.mainThread(GenericNetwork.java:426) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] Plugin exit (took 442 milliseconds) "

TF_SavedModel.zip

guijacquemet commented 5 years ago

I do not get this error when I run one of the demo network that I open using "run your network"

guijacquemet commented 5 years ago

I also get this error when I run a network generated with the CARE training example "3D denoising of Tribolium castaneum" . Network generated again using google colab (see below).

Error is : INFO] Loading TensorFlow model GenericNetwork_f03766b50027386b972482b0880d5a22 from source file file:/home/guillaume/Desktop/denoising%20of%20Tribolium%20castaneum.zip org.tensorflow.TensorFlowException: Op type not registered 'MulNoNan' in binary running on guillaume-HP-EliteDesk-800-G1-TWR. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. at org.tensorflow.SavedModelBundle.load(Native Method) at org.tensorflow.SavedModelBundle.access$000(SavedModelBundle.java:27) at org.tensorflow.SavedModelBundle$Loader.load(SavedModelBundle.java:32) at org.tensorflow.SavedModelBundle.load(SavedModelBundle.java:95) at net.imagej.tensorflow.DefaultTensorFlowService.loadModel(DefaultTensorFlowService.java:114) at de.csbdresden.csbdeep.network.model.tensorflow.TensorFlowNetwork.loadModel(TensorFlowNetwork.java:141) at de.csbdresden.csbdeep.network.model.DefaultNetwork.loadModel(DefaultNetwork.java:52) at de.csbdresden.csbdeep.network.DefaultModelLoader.loadNetwork(DefaultModelLoader.java:41) at de.csbdresden.csbdeep.network.DefaultModelLoader.run(DefaultModelLoader.java:20) at de.csbdresden.csbdeep.commands.GenericNetwork.tryToPrepareInputAndNetwork(GenericNetwork.java:494) at de.csbdresden.csbdeep.commands.GenericNetwork.initiateModelIfNeeded(GenericNetwork.java:291) at de.csbdresden.csbdeep.commands.GenericNetwork.mainThread(GenericNetwork.java:426) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.NullPointerException at de.csbdresden.csbdeep.network.DefaultInputValidator.checkForTooManyDimensions(DefaultInputValidator.java:32) at de.csbdresden.csbdeep.network.DefaultInputValidator.run(DefaultInputValidator.java:18) at de.csbdresden.csbdeep.commands.GenericNetwork.tryToPrepareInputAndNetwork(GenericNetwork.java:497) at de.csbdresden.csbdeep.commands.GenericNetwork.initiateModelIfNeeded(GenericNetwork.java:291) at de.csbdresden.csbdeep.commands.GenericNetwork.mainThread(GenericNetwork.java:426) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] Plugin exit (took 169 milliseconds)

denoising of Tribolium castaneum.zip

uschmidt83 commented 5 years ago

I haven’t looked into this, but my guess is that the version of tensorflow on Google colab is newer than the version used in the Fiji plugin. Hence, the exported model may use features that are not supported in tensorflow as bundled with the Fiji plugin.

guijacquemet commented 5 years ago

Thanks, that is very useful. Which version of tensorflow would be the one to use ?

uschmidt83 commented 5 years ago

The Fiji plugin uses tensorflow 1.12 at the moment. I don't know if it's possible to downgrade the version of tensorflow in Google colab.

fjug commented 5 years ago

Hmmm... possible... but we really should not make people do that: https://stackoverflow.com/questions/51888118/how-to-downgrade-tensorflow-version-in-colab

@frauzufall @uschmidt83 Any good reason why we should not upgrade our Fiji plugin to 1.14.0-rc1? Or even better, couldn't we have multiple versions on multiple update sites? So everyone just picks what suits his/her needs...

uschmidt83 commented 5 years ago

Or even better, couldn't we have multiple versions on multiple update sites? So everyone just picks what suits his/her needs...

This or even better choose the version of tensorflow in the plugin UI. I've already discussed this with @frauzufall. The problem is more that there are too many things to do and not enough time...

frauzufall commented 5 years ago

I am working on a command which helps download / install different versions. Without auto-detecting stuff I hope this will be done quickly, but can't promise anything yet.

guijacquemet commented 5 years ago

This would be very useful ! looking forward to test it !

lacan commented 5 years ago

Hi all, especially @guijacquemet.

We are working with tensorflow 1.14 and it works. The only thing you have to do is download the right DLL file while waiting for @frauzufall to complete her command. https://www.tensorflow.org/install/lang_java#tensorflow_with_the_jdk

I downloaded libtensorflow_jni-cpu-windows-x86_64-1.14.0.zip and unzipped the DLL in my main Fiji folder (Fiji.app) and Zip files saved with TF 1.14 work. Didn't need to do anything else...

PS: I removed the other tensorflow_jni.dll that came with the update site that was in the jars folder

Fastander commented 5 years ago

I am having a similar issue with the Fiji app and I was really looking forward to using tf 1.14 zip models in Fiji but for some reason I keep crashing the plugin with the following error. Is there something simple I am missing? Perhaps it isn't related to the TF version or there is something I can check to troubleshoot?

java.lang.IllegalArgumentException: NodeDef mentions attr 'explicit_paddings' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: {{node down_level_0_no_0/convolution}} = Conv2D[T=DT_FLOAT, _output_shapes=[[?,?,?,32]], data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](input, down_level_0_no_0/kernel/read). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.). at org.tensorflow.SavedModelBundle.load(Native Method) at org.tensorflow.SavedModelBundle.access$000(SavedModelBundle.java:27) at org.tensorflow.SavedModelBundle$Loader.load(SavedModelBundle.java:32) at org.tensorflow.SavedModelBundle.load(SavedModelBundle.java:95) at net.imagej.tensorflow.DefaultTensorFlowService.loadModel(DefaultTensorFlowService.java:114) at de.csbdresden.csbdeep.network.model.tensorflow.TensorFlowNetwork.loadModel(TensorFlowNetwork.java:141) at de.csbdresden.csbdeep.network.model.DefaultNetwork.loadModel(DefaultNetwork.java:52) at de.csbdresden.csbdeep.network.DefaultModelLoader.loadNetwork(DefaultModelLoader.java:41) at de.csbdresden.csbdeep.network.DefaultModelLoader.run(DefaultModelLoader.java:20) at de.csbdresden.csbdeep.commands.GenericNetwork.tryToPrepareInputAndNetwork(GenericNetwork.java:494) at de.csbdresden.csbdeep.commands.GenericNetwork.initiateModelIfNeeded(GenericNetwork.java:291) at de.csbdresden.csbdeep.commands.GenericNetwork.mainThread(GenericNetwork.java:426) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

frauzufall commented 5 years ago

@Fastander sorry to read that seems like you are not the only one with this issue: https://github.com/tensorflow/tensorflow/issues/22994 Does it work when you do the prediction in python?

@guijacquemet @lacan If anyone with a Windows machine could build this imagej-tensorflow branch: https://github.com/imagej/imagej-tensorflow/pull/18, copy the result to their FIji installation and test the Command in Edit > Options > TensorFlow... to switch TensorFlow versions and let me know if it works.. Would be highly appreciated!

Fastander commented 5 years ago

@frauzufall everything works well in python only Fiji complains. That thread you pointed to ends very pessimistically.. do you think there is something that can be done to make this work? I am sure I understand which version of what is a problem where...

frauzufall commented 5 years ago

@Fastander Could you post an exported model (the ZIP) and an example image so that I can reproduce the issue? You can also send it to me vial mail (frauzufall@mpi-cbg.de) and I will have a look at it.

frauzufall commented 5 years ago

@Fastander Thanks for the data, I was able to reproduce the issue in Fiji on Linux and fix it by using TF 1.14.0 for Java. You can download the JNI for Windows here: https://www.tensorflow.org/install/lang_java#download and unpack it to Fiji.app/lib/win64/and hopefully this will be sufficient. Check the version that is displayed at the beginning of the Console output. @lacan also described in a post above what he did to install TF 14.1 on Windows, maybe this helps as well, let me know! :)

Fastander commented 5 years ago

I have tried doing that with extracting to the Fiji app folder but I get the same error. Somehow still Fiji Console states that I am using tensorflow 1.12.0 while in python when i do tf.VERSION i get '1.14.0'. Maybe it is worth deleting and reinstalling Fiji?

Alex

On Mon, Jul 15, 2019 at 3:42 PM Deborah Schmidt notifications@github.com wrote:

@Fastander https://github.com/Fastander Thanks for the data, I could reproduce the issue in Fiji on Linux and fix it by using TF 1.14.0 for Java. You can download the JNI for Windows here: https://www.tensorflow.org/install/lang_java#download and unpack it to Fiji.app/lib/win64/and hopefully this will be sufficient. Check the version that is displayed at the beginning of the Console output. @lacan https://github.com/lacan also described in a post above what he did to install TF 14.1 on Windows, maybe this helps as well, let me know! :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CSBDeep/CSBDeep_fiji/issues/28?email_source=notifications&email_token=AMP7YAKX2VVICIYDDPAOBI3P7T4JRA5CNFSM4HZADEZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ7FPTQ#issuecomment-511596494, or mute the thread https://github.com/notifications/unsubscribe-auth/AMP7YAOJYJ6BYBHZE3J72BDP7T4JRANCNFSM4HZADEZQ .

--

Alexander Fast, PhD

Postdoctoral Fellow

Beckman Laser Institute

University of California, Irvine

1002 Health Sciences Rd

Irvine, CA, 92612

frauzufall commented 5 years ago

Can you try what @lacan suggested? He wrote he removed the tensorflow_jni.dll in the jars folder that came with the update site. I'll try to get a Windows system running so that I can better follow what's happening there.

frauzufall commented 5 years ago

@Fastander I installed Fiji on my VM Windows 10, installed the CSBDeep update site, tried your model and got the same error. Next I downloaded the 1.14.0 CPU version of the TensorFlow JNI and unpacked it to Fiji.app/lib/win64 and then I was able to run your model. GPU should also work, did not try that yet. Maybe a new Fiji download would really help.

lacan commented 5 years ago

@Fastander, perhaps you have the old dll still in your Fiji. Look for 'tensorflow_jni.dll' in your Fiji folder and delete all the ones you find before adding the one downloaded from @frauzufall's link?

Fastander commented 5 years ago

@frauzufall @lacan Thank you both very much it works now! The issue was indeed the 'tensorflow_jni.dll' file in the jar folder that once removed solves the issue.