aurora95 / Keras-FCN

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation(Unfinished)
MIT License
650 stars 268 forks source link

inference.py: ValueError: Dimension 0 in both shapes must be equal, but are 1 and 2 for 'Assign_318' #37

Closed bgr33r closed 7 years ago

bgr33r commented 7 years ago

Hello,

I have successfully trained the AtrousFCN_Resnet50_16s model with my own data and am having difficulty getting the inference.py module to run. I am hoping that you might have some insight for how I can get the model up and running. Can you help? Thanks! See below:

First a little about the model: My own training data has 3 values: 0 corresponds to the VOID layer in the VOC Data and is the outline of my objects. 1 corresponds to the segmentation pixels I am looking for. And 2 corresponds to background. When I trained the model, I set the number of classes to 2 with the desire/expectation that the model would train to identify categories 1 and 2 in the same fashion as the VOC2011 data. I used the loss function "softmax_sparse_crossentropy_ignoring_last_label" that was included with your code. Here are my results after 250 epochs. grape_all_2017_08_25_2_classes_plots From the shapes of the plots, I believe my model is training correctly. As an output, in the AtrousFCN_Resnet50_16s folder, I get a new file with the timestamp for model completion titled 'checkpoint_weights.hdf5' which I understand as the new weights/biases for the trained model.

Now, I am working on now evaluating the model running evaluation.py using the AtrousFCN_Resnet50_16s model and the new checkpoint file. I have set nb_classes to 2, and pointed the code to use the new checkpoint file for weights. When I run the code I receive the following error:

ValueError: Dimension 0 in both shapes must be equal, but are 1 and 2 for 'Assign_318' (op: 'Assign') with input shapes: [1,1,2048,21], [2,2048,1,1].

This error traces back to line 32 of inference.py: model.load_weights(checkpoint_path, by_name=True)

Interestingly, when I instead point to the weight file I used when I initialized training ('fcn_resnet50_weights_tf_dim_ordering_tf_kernels.h5), I do not get this error. I think the problem is with inference.py interpreting the new model weights file or with the model itself.

Will you verify that I am pointing to the correct model weights file for evaluation?

I also see that line 31 in inference.py in my code reads: `model = globals()[model_name](batch_shape=batch_shape, input_shape=(512, 512, 3))', and line 21 of my code reads: 'batch_shape = (1, ) + image_size + (3,)'.

When I look at my error again (> ValueError: Dimension 0 in both shapes must be equal, but are 1 and 2 for 'Assign_318' (op: 'Assign') with input shapes: [1,1,2048,21], [2,2048,1,1].) I notice that the matrices look reversed, and that the model is looking for 21 classes instead of 2 (last digit versus first digit). What do I need to adjust in my code to fix this error? Do I need to change the batch shape? Is there a spot during model initialization where I am incorrectly loading weights? Do I need to adjust something in the SegDataGenerator to get it to work? Thank you for your input.

bgr33r commented 7 years ago

I solved this. The Problem was with the model: it had 21 classes instead of 2.

simonsayshi commented 6 years ago

hi, how do you solve this issue? I changed the classes in train.py from 21 to 5(my data has 5 classes) and the training works good but when I run evaluate.py(I also changed the nb_classes to 5) this issue occurs:

ValueError: Dimension 0 in both shapes must be equal, but are 1 and 5. Shapes are [1,1,4096,21] and [5,4096,1,1]. for 'Assign_30' (op: 'Assign') with input shapes: [1,1,4096,21], [5,4096,1,1].

this is really annoying since I matched the parameters for train and evaluate but this just keep showing, how did you solve it? thanks!

simonsayshi commented 6 years ago

@bgr33r

bgr33r commented 6 years ago

@simonsayshi I would recommend digging through your code and making sure that the variables relating to class number are set correctly. Also, you should make sure you're loading the correct model file (.hdf5). I "solved" the problem when I realized I had made a mistake with the number of classes denoted in the model (doh!).

simonsayshi commented 6 years ago

thanks for reply! but I still want make it clear. I only changed the classe in train.py and nb_classes in evaluate.py for my dataset when I use the original number, 21 classes, the train.py and evaluate.py works without any problems but when I change it to any other number like 4 or 5(for both scripts), the issue occurs

I only have one model file which is generated by train.py so I don't think I loaded wrong model. So what did you do? did you just let the 'classes' in train.py and 'nb_class' in evaluate.py be the same number then everything works? this is what I did but the issue remains! :(

DianaLi96 commented 6 years ago

@simonsayshi may be you can see the default classes in models.py ,change the default setting to what do you want, this may cause the model which last layer dimension is 21 in hdf5 file.it may be the reason of the problem.