Closed frikyng closed 2 years ago
Hi Friedrich, this argument got moved to training_params
but the examples haven't been updated. Please delete the lines
generator_test_param["steps_per_epoch"] = -1
and
generator_param["steps_per_epoch"] = steps_per_epoch
and add a line
training_param['steps_per_epoch'] = steps_per_epoch
Please see #79
Duplicate of #79
Thanks for the info! I have changed the file and the training works now. However, I get another issue in that the console displays a pipe error after the last batch has been processed. Can you tell what the issue is?
(deepinterpolation) C:\Users\SunLab\Documents\FK\deepinterpolation\deepinterpolation_program_files\examples>python cli_example_tiny_ephys_training.py
2022-02-10 16:01:03.876726: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
WARNING:root:train_path has been deprecated and is to be replaced by data_path as generators can be used for training and inference. We are forwarding the value but please update your code.
WARNING:root:pre_post_frame has been deprecated and is to be replaced by pre_frame and post_frame. We are forwarding the value but please update your code.
WARNING:root:train_path has been deprecated and is to be replaced by data_path as generators can be used for training and inference. We are forwarding the value but please update your code.
WARNING:root:pre_post_frame has been deprecated and is to be replaced by pre_frame and post_frame. We are forwarding the value but please update your code.
INFO:Training:wrote C:\Users\SunLab\Documents\FK\deepinterpolation\deepinterpolation_program_files\examples\2022_02_10_16_01_training_full_args.json
INFO:Training:wrote C:\Users\SunLab\Documents\FK\deepinterpolation\deepinterpolation_program_files\examples\2022_02_10_16_01_training.json
INFO:Training:wrote C:\Users\SunLab\Documents\FK\deepinterpolation\deepinterpolation_program_files\examples\2022_02_10_16_01_generator.json
INFO:Training:wrote C:\Users\SunLab\Documents\FK\deepinterpolation\deepinterpolation_program_files\examples\2022_02_10_16_01_network.json
INFO:Training:wrote C:\Users\SunLab\Documents\FK\deepinterpolation\deepinterpolation_program_files\examples\2022_02_10_16_01_test_generator.json
2022-02-10 16:01:06.328756: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-02-10 16:01:06.329530: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-02-10 16:01:06.329596: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-10 16:01:06.331949: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-Q7NC8E2
2022-02-10 16:01:06.332069: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-Q7NC8E2
2022-02-10 16:01:06.332351: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-10 16:01:06.332768: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
INFO:Training:created objects for training
2022-02-10 16:01:06.605828: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/4
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
2022-02-10 16:01:07.472070: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:09.792270: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:12.162427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:14.485456: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:16.807172: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:19.089091: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:21.388160: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:23.691425: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:26.074197: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:28.362816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:30.641317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:32.926921: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:35.216118: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:37.500432: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:39.790714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:01:42.067839: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
10/10 [==============================] - ETA: 0s - loss: 0.5024WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
2022-02-10 16:03:00.258208: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:02.563265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:04.848672: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:07.133003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:09.428603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:11.731290: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:14.261508: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:17.216868: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2022-02-10 16:03:20.355943: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Exception in thread Thread-6:
Traceback (most recent call last):
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 748, in _run
with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 727, in pool_fn
initargs=(seqs, None, get_worker_id_queue()))
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\pool.py", line 176, in __init__
self._repopulate_pool()
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\SunLab\.conda\envs\deepinterpolation\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
It looks like it lost access to the data mid-way. Is it possible that the files were unaccessible for a bit ?
I wouldn't say so. I am working locally with an SSD and didn't have much else running at the time that could have clogged up the data connection. I don't know how the end of the output is supposed to look like but the data of this epoch seem to be processed in full. At lest there is a 10/10 in the progress indicator.
This is the end of the first epoch? How is the validation data set provided?
Can you elaborate please? I'm sorry but I am not familiar with this. What I did is literally just follow the instructions on the main page of this repo. i.e activate the environment, navigate to the folder with cli_example_tiny_ephys_training.py, and then run the thing in the terminal
Sure. The main part of the epoch when the training happens, tensorflow access the data provided by the training generator. When this is finished (showing 10/10 here), It jumps to access the validation data to measure performance. So I was wondering if the issue could be related to the validation dataset or "generator_test_param"
Ah I see, thanks. You can inspect this file, that I use to run deepinterpolation. to_inspect_deepinterp_FK.txt I doubt though it's fishy. Can the CUDA version be an issue? I have the most recent version installed on this PC (11.6).
It could be CUDA. See here for the tested combination by tensorflow https://www.tensorflow.org/install/source#gpu
Hmm the PC I am working on currently doesn't have an Nvidia graphics card, only the integrated Intel graphics. I tried to work around it but couldn't find a way. Is pointless to continue to try and make DeepInterp run? Otherwise I'll search for another PC/graphics card.
Any kind of deep learning work is better ran on GPUs. Those integrated cards only work for very small jobs probably. Some gaming cards are fairly inexpensive See here : https://lambdalabs.com/gpu-benchmarks The A100 are the rolls Royce now but there is a range of price. Most of my training was done on much older cards. You can check the method section of the paper
Thanks for the info. I'll check some prices.
Hey, so I have luckily found a PC with a decent graphics card (GTX 1050) and could finally do my first deep interpolation on some calcium data (with the example data provided). However, when transitioning to my own I get this error.
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the
concatenation axis. Received: input_shape=[(None, 64, 98, 1024), (None, 64, 99, 512)]
Do you know what the issue could be? The issue occurs in network_collection in local_network_function. Regarding my data, the only difference between your sample and my data is the resolution - 796x512. I have also checked if the dpi settings of the data interfered with the function and changed the metadata in FIJI. Thanks
Yes, changing the input size can have impact on the merging layers. You could try to feed a 1024x512 image in instead of 796x512 (filling in with zeros). I think that should prevent this issue.
Hi,
I just installed deepinterpolation and want to run the test data set that you are providing with the code. However, I get this error when I run the example python script from the anaconda console. Can you help? I tried checking the py script for anything wrong that I could have introduced but didn't find anything + I didn't make any changes to it anyway as it seems that the paths you have to change manually as per the documentation are now generated automatically. Can you help?
Thanks, Friedrich