layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape intf_segformer_for_semantic_segmentation/decode_head/dropout_24/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer

Doodleverse / segmentation_gym

A neural gym for training deep learning models to carry out geoscientific image segmentation. Works best with labels generated using https://github.com/Doodleverse/dash_doodler

MIT License

45 stars 11 forks source link

layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape intf_segformer_for_semantic_segmentation/decode_head/dropout_24/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer #147

Closed bestplanetarian closed 3 months ago

bestplanetarian commented 4 months ago

Describe the bug When I train the segformer model, this error occurs, the program is still running but the error occurs for each epoch, I try to change the tf.transpose but this does not solve the problem, I m using python 3.8. How can I solve it

OS: Ubuntu 18.01

ebgoldstein commented 4 months ago

Hi @bestplanetarian:

Have you made any code changes to Segmentation Gym?
does the model actually train? if it trains, and produces reasonable results, than you can sometimes ignore the issue...
Have you checked all you training data to make sure it is good? to make sure it has the correct number of channels, made sure images are correct, etc?
Using google I have found a few hits that relate to this error, have you checked them out? do any of the suggested fixes work for you?
If you still have an issue, and you want some help, we will need more details. Please give us an example training dataset where the error occurs... this would be maybe 10 image-label pairs, and the config file you are using.

bestplanetarian commented 3 months ago

Thanks for your comments, sorry these I m on a trip

No, No change are made to segmentation Gym

The model actually train, for the segformer, it only train 20 epochs, getting 50% mean iou on test dataset and 76% overall accuracy on test dataset. Train dataset contains 144 images and test dataset contains 48 images, single channel

I found a solution which suggests me to change the tf,transpose in the 'train_model', but it does not work

Here are 10 images and corresponding masks + configuration file

https://drive.google.com/drive/folders/16SALyvJ0NgDptXyLrBCoo2OowXSUNn_D?usp=drive_link

dbuscombe-usgs commented 3 months ago

Hi @bestplanetarian I can take a look at this for you. I just tried to download you files from g-drive, but no access - please could you make this folder accessible? And I will try again.

dbuscombe-usgs commented 3 months ago

One concern I have is your OS, which is now 6+ years old. The other concern is that you are using python 3.8. We recommend 3.10 - see https://github.com/Doodleverse/segmentation_gym?tab=readme-ov-file#windows

bestplanetarian commented 3 months ago

https://drive.google.com/drive/folders/16SALyvJ0NgDptXyLrBCoo2OowXSUNn_D?usp=sharing. I think that this one will work.

I have some trouble installing dependency of segmentation gym under environment 3.10, which is why I use python 3.8

bestplanetarian commented 3 months ago

One concern I have is your OS, which is now 6+ years old. The other concern is that you are using python 3.8. We recommend 3.10 - see https://github.com/Doodleverse/segmentation_gym?tab=readme-ov-file#windows

https://drive.google.com/drive/folders/16SALyvJ0NgDptXyLrBCoo2OowXSUNn_D?usp=sharing. I think that this one will work.

I have some trouble installing dependency of segmentation gym under environment 3.10, which is why I use python 3.8

ebgoldstein commented 3 months ago

Hi @bestplanetarian - i am able to train this model using the data you provide. I am not exactly sure why this error occurs, but it does not prevent training. I am not able to debug your model further to help improve model performance. Because the model trains, I am going to close the issue right now...

As for improved performance for a specific dataset, we can't help with that - it is beyond the scope of what we do here as an issue.. My suggestion is to first look closely at the training data and your config file, then maybe play around with the hyperparameters in your config file if the data looks ok... But I would definitely start by really inspecting the training data and the labels.... When i look at examples that are printed during 'make_dataset`, the colors and shapes of the labels look a bit strange... I will attach a few below..

ebgoldstein commented 3 months ago

Lamine_waterrmnoaug_ex2 Lamine_waterrmnoaug_ex4 Lamine_waterrmnoaug_ex7