Open rmarabini opened 2 years ago
Hi @rmarabini ! Thanks for letting us know about this issue.
I just updated the filenames for the datasets when they are downloaded with the bash script, they were swapped previously. Unzipping the files should generate two directories called _backbonedataset/ and _rcnn_datasetfull/, each of which contains training and validation folders.
The augmentate-images.py script assumes you are in the dataset folder an that the datasets were downloaded there, sorry this was not explicit before I updated the instructions in the README. Downloading the datasets and image augmentation should work now by running
cd dataset/
bash download_dataset.sh
unzip backbone_dataset.zip
unzip rcnn_dataset_full.zip
python3 augment-images.py
Let me know if this helps.
I'm looking into the libgc requirements for the GUI, which OS are you using?
Thanks for the advice, I will try it and let you know the results.
hui,,
Thanks for the help. I am using
"Debian GNU/Linux 10 (buster)
as operating system
regards
Roberto
On Tue, Jul 5, 2022 at 9:59 PM jsreyl @.***> wrote:
Hi @rmarabini https://github.com/rmarabini ! Thanks for letting us know about this issue.
I just updated the filenames for the datasets when they are downloaded with the bash script, they were swapped previously. Unzipping the files should generate two directories called backbone_dataset/ and rcnn_dataset_full/, each of which contains training and validation folders. The augmentate-images.py script assumes you are in the dataset folder an that the datasets were downloaded there, sorry this was not explicit before I updated the instructions in the README. The downloading the datasets and image augmentation should work now by running cd dataset/ bash download_dataset.sh unzip backbone_dataset.zip unzip rcnn_dataset_full.zip python3 augment-images.py Let me know if this helps.
I'm looking into the libgc requirements for the GUI, which OS are you using?
— Reply to this email directly, view it on GitHub https://github.com/Perilla-lab/TEMNet/issues/3#issuecomment-1175446115, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQGTISMQSQVZ6FHANCHFADVSSH33ANCNFSM52VN3HEQ . You are receiving this because you were mentioned.Message ID: @.***>
H @jsreyl
I have been able to progress executing the augment-images.py script but I get an error when executing this lines of code
elif atype=='salt-pepper':
#Add gaussian noise of mean 0 and stddev 1
aug_image=image/255
noise = tf.random.normal(shape=tf.shape(image), mean=0.0, stddev=1.0, dtype=tf.float32)
aug_image = tf.add(image, noise)
aug_image=aug_image*255
aug_x, aug_y, aug_w, aug_h = x, y, w, h
the error states that in line
aug_image = tf.add(image, noise)
we are trying to add a uint8 array (image) and a float32 array (noise).
So the questions are:
1) I think this line is wrong
aug_image = tf.add(image, noise)
and should be
aug_image = tf.add(aug_image, noise)
note that both aug_image and noise are float arrays
2) After
aug_image=aug_image*255
the array aug_image should be converted to int8 may be with
aug_image = aug_image.astype(np.uint8)
Let me know if you think that these two modifications should be included in the code.
hui,, Thanks for the help. I am using "Debian GNU/Linux 10 (buster) as operating system regards Roberto
Thanks, I'll set up a virtual machine with Debian to investigate the library requirements.
H @jsreyl
I have been able to progress executing the augment-images.py script but I get an error when executing this lines of code
elif atype=='salt-pepper': #Add gaussian noise of mean 0 and stddev 1 aug_image=image/255 noise = tf.random.normal(shape=tf.shape(image), mean=0.0, stddev=1.0, dtype=tf.float32) aug_image = tf.add(image, noise) aug_image=aug_image*255 aug_x, aug_y, aug_w, aug_h = x, y, w, h
the error states that in line
aug_image = tf.add(image, noise)
we are trying to add a uint8 array (image) and a float32 array (noise).
So the questions are:
1. I think this line is wrong
aug_image = tf.add(image, noise)
and should be
aug_image = tf.add(aug_image, noise)
note that both aug_image and noise are float arrays
2. After
aug_image=aug_image*255
the array aug_image should be converted to int8 may be with
aug_image = aug_image.astype(np.uint8)
Let me know if you think that these two modifications should be included in the code.
Thank you for the report. I was able to reproduce this error when running a python environment with tensorflow>2.1. Tensorflow 2.1 which is currently required for the training pipeline does run the augmentation without explicit typecasting to int8.
For compatibility with tensorflow >= 2.1 (which can be used for inference or backbone pretraining) I just pushed changes to the code ( https://github.com/Perilla-lab/TEMNet/commit/cc3ce970e333b5486aee954427318dbae492c04e) casting the image as uint8 as you suggested:
aug_image=tf.cast(aug_image*255,dtype=tf.uint8)
Let me know if this helps.
Hi @jsreyl,
Now the augment-images.py script runs successfully ;-) so I moved to the training procedure. I changed to the directory scripts/rcnn and executed the command
python3 train.py -b temnet -g 1
the output is the following error message.
lr
argument is deprecated, use learning_rate
instead.
super(SGD, self).init(name, **kwargs)
Traceback (most recent call last):
File "/home/roberto/scipion3/software/em/TEMNet/scripts/rcnn/train.py", line 70, in Do you have any idea what is going on? Should I install Tensorflow version 2.1 instead of the version 2.9 that I am using right now?
thanks for the help
Roberto
Hi @rmarabini ,
Thanks for reporting this issue. Indeed I have seen that error before, it's caused by differences in the add_loss functions of keras on different tensorflow versions and it requires major rehauling of the code. I've been working on fixing it in the tf29 branch so that all the code can be run in Tensorflow 2.9 but I have not managed to fix it yet. I believe in one to two weeks I will be able to release a TF2.9 compatible version of the code. I'll keep you updated on my progress.
My suggestion is to install Tensorflow 2.1 using
pip install -r requirements_tf21.txt
to run the training pipeline.
Note however previous versions of Tensorflow supported python3.5 to 3.8, so you may have to create an environment with a python version in that range (which is easy to do with conda https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands ).
Let me know if this helps.
Juan
Hi @jsreyl,
Thanks for your help ;-). I installed python 3.7, protobuf 3.20 and tensorflow 2.1. This time when I execute " python3 train.py -b temnet" I got a different error. In particular the line
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.
may be interesting. Do I need a particular nvidia driver or cuda version? I attach the output of the command nvidia-smi, so you can see the driver and cuda library that I have.
cheers
Roberto
====
TensorFlow Version 2.1.0
Number of devices recognized by Mirror Strategy: 1
Training for RPN: False
classes:Dataset: reading data from /home/roberto/scipion3/software/em/TEMNet/dataset/rcnn_dataset_augmented/train
classes:Dataset: reading data from /home/roberto/scipion3/software/em/TEMNet/dataset/rcnn_dataset_augmented/val
Traceback (most recent call last):
File "train.py", line 70, in
=====
(temnet-env) roberto@clark7:~/scipion3/software/em/TEMNet/scripts/rcnn$ nvidia-smi
Fri Jul 8 15:25:51 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:5E:00.0 Off | N/A |
| 30% 39C P8 8W / 350W | 4MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:AF:00.0 Off | N/A |
| 31% 42C P8 31W / 350W | 116MiB / 24265MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
===
Hi @rmarabini ,
Yes, Tensorflow 2.1 requires cuda 10.2 and cuDNN 7.6 to work, there should be no problem with the driver. Let me know if this helps :) .
Hi @jsreyl,
I just want to thank you for your help. Unfortunately, after playing a couple of days with virtual environments and different versions I have decided to give up :-(.
Just for the record follows the error message that I get when I tried to train the net
(temnet-env) roberto@clark7:~/scipion3/software/em/TEMNet/scripts/rcnn$ time python3 train.py -b temnet
TensorFlow Version 2.1.0
Number of devices recognized by Mirror Strategy: 2
Training for RPN: False
classes:Dataset: reading data from /home/roberto/scipion3/software/em/TEMNet/dataset/rcnn_dataset_augmented/train
classes:Dataset: reading data from /home/roberto/scipion3/software/em/TEMNet/dataset/rcnn_dataset_augmented/val
Traceback (most recent call last):
File "train.py", line 70, in <module>
train_model(args.backbone, args.weights, args.gpu)
File "train.py", line 43, in train_model
rcnn = RCNN(config, 'train')
File "/home/roberto/scipion3/software/em/TEMNet/scripts/rcnn/model.py", line 674, in __init__
self.compile()
File "/home/roberto/scipion3/software/em/TEMNet/scripts/rcnn/model.py", line 1669, in compile
self.keras_model.add_loss(loss)
File "/home/roberto/miniconda/envs/temnet-env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1081, in add_loss
self._graph_network_add_loss(symbolic_loss)
File "/home/roberto/miniconda/envs/temnet-env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 1484, in _graph_network_add_loss
self._insert_layers(new_layers, new_nodes)
File "/home/roberto/miniconda/envs/temnet-env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/network.py", line 1439, in _insert_layers
layer_set = set(self._layers)
File "/home/roberto/miniconda/envs/temnet-env/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/data_structures.py", line 598, in __hash__
raise TypeError("unhashable type: 'ListWrapper'")
TypeError: unhashable type: 'ListWrapper' ```
Hi @rmarabini , Thanks for reporting the error. That should not be happening in a Tensorflow 2.1 environment. I just updated the code and tested it for training and prediction. It should work now.
We are working on containerizing the application so that the environment setup is not left for the user and the code can be readily run instead of doing all the hassle you've been going through.
Let me know if this helps.
I tried to use the program with no result. In the following I describe what I did.
1) clone the repository 2) create a virtual environment and install the python modules (requirements.txt) 3) download databases (bash ./dataset/download_dataset.sh) 4) unzip de downloaded databases. Both zip files create the same directory 'backbone_dataset' and some files have the same name 5) execute augment-images.py script There are two copies of this script one in the directory "etc" and the other in "dataset" In the following I describe the errors that appear when using the script located at "dataset" a) the script assume the existence of a directory called rcnn_dataset_full that does not exist. I linked rcnn_dataset_full to backbone_dataset b) The execution returns the message:
c) indeed the file './rcnn_dataset_full/train/mature/mature.png' does not exist.
Any help regarding how to process will be welcome
By the way, I also tried to execute the GUI but it seems that the version of my libgc is not compatible with the one assumed by the binary.