Doodleverse / segmentation_gym

A neural gym for training deep learning models to carry out geoscientific image segmentation. Works best with labels generated using https://github.com/Doodleverse/dash_doodler
MIT License
45 stars 11 forks source link

make_nd_datasets gives only 1 class #126

Closed mvilar22 closed 1 year ago

mvilar22 commented 1 year ago

From the results folders from Doodler I used _gen_images_andlabels.py, which created 4 folders, in the label folder there are only black images, however in the overlay folder I can clearly see the segmented image with the 2 classes separated. I move the images and label folders to my gym structure and run _make_nddataset.py. This populates my folder according to the tutorial but it tells me "Found 769 images belonging to 1 classes. "

Mi config file has "NCLASSES": 2, and the doodler txt file also has 2 classes.

If I try to train the model I get this message error:

A 'concatenate' layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None,176,320,128), (None, 175,320,32)]

I have seen a similar issue with tensor shape that was due to having images with only 1 class, which was solved using the balanced_labels script, but in my case, having only 1 class, that does not seem to be a solution.

Now, this are mi labels and overlays directories: Captura1

Captura2

And here is the 1 classes message when running _make_nddatasets.py and the error during training: Sin título

Sin título2

I hope this was not too difficult to understand, my english is a bit rusty, thanks!

ebgoldstein commented 1 year ago

Hi @mvilar22 : 1) The all black images in the label folder are expected - each pixel in the jpg only ranges from 0-1 (instead of 0-255 for normal jpegs). The overlays confirm that the model worked correctly. If you read a label jpg into python and look at it as an array, you will see it contains 0's and 1's corresponding to each class. 2) When you run makedatasets, you will always see this '1 class' message because of the way some of the keras code works . Don't worry about it, it will still make the correect data outputs based on how many classes you define in the CONFIG file.. 3) Tensorshape issue -First, can you confirm that you are using the Images and Labels folder (the all black files, not the overlays) when calling 'MakeDataset'? and then using the output files (typically stored in 'npz4zoo`) when training the model? Second, can you drop your CONFIG file here for us to look at?

(did i miss anything else?)

mvilar22 commented 1 year ago

Hi, thanks for the answer and the explanation!

The config file (vanilla_unet_crops.zip) is basically the vanilla config with 2 classes and a different target size, I attached it in a zip anyway.

Concerning the tensor shape training error, I select npzForModel when prompted to select data files. In that folder I have the npz files and two subdirectories, named aug_sample and noaug_sample, that contain png images. I think this folder was populated by make_nd_dataset.

As for the images for 'make_nd_dataset', the labels are the all black images and the images are the original images, without doodles or overlays.

Thanks again!

ebgoldstein commented 1 year ago

Hi @mvilar22 - thx for the config file.

the models in gym are very sensitive to image size, and only certain shapes/sizes work...

Can you try to change the target size for me to:

"TARGET_SIZE": [768,1024], or : "TARGET_SIZE": [512,768], or even: "TARGET_SIZE": [512,512],

(I recommend 512,512 just to see if it works..)

let me know how it goes! -evan

mvilar22 commented 1 year ago

Hi again evan, I gave it a try with "TARGET_SIZE": [512,512] and, running the training script, got further along in the script (if that makes sense), however an error showed up in the first epoch.

'All dimensions except 3 must match. Input 1 has shape [5 88 160 64] and doesn't match input 0 with shape [5 64 64 128].'

error3

Not all mi images have the same size, perhaps that's the problem? I thought the make dataset script resized the images

Should I run make_nd_datasets again with the different target size in the config and then try training?

Thanks for the swift answers!

ebgoldstein commented 1 year ago

yes - sorry, i forgot to mention that you need to rerun makedatasets ..

mvilar22 commented 1 year ago

Don't worry, I run it again and now seems to be working. I will check tomorrow to make sure everything is fine (although I already see the loss being NaN, I recall seeing some issues about it, will probably change the loss function) and close the issue if it's truly solved.

Thanks again for the help!

ebgoldstein commented 1 year ago

awesome - keep us informed!

mvilar22 commented 1 year ago

Its working just fine, I am closing the issue.

Thanks again for the help!

ebgoldstein commented 1 year ago

great news that it runs. the NaN loss issue/solution is here: https://github.com/Doodleverse/segmentation_gym/issues/113#issuecomment-1407089671 let us know if we can help! -evan