Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
575 stars 147 forks source link

Is there a bug in the class 'SVHNSoftmaxMetrics' #18

Closed mzhaoshuai closed 6 years ago

mzhaoshuai commented 6 years ago

When I try to use the centered dataset(I download this dataset from the address you provided) to train a model from scratch, I run the train_svhn.py and meet the error: ps : The first line of the csv file is 1 2.

    │The timestep in the metric is 1
    │The timestep of the LSTM in localization net is 1
    │The timestep of the LSTM in the recognition net is 1
    │The label length of the recognition net is 2
    │calc_loss:The number of characters is 1
    │calc_loss:The number of timesteps is 1
    │Exception in main training loop:
    │Invalid operation is performed in: SoftmaxCrossEntropy (Forward)
    ...
    ...

    │Expect: in_types[0].shape[0] == in_types[1].shape[0]
    │Actual: 200 != 400

I add those code to print the related information in the calc_loss method of the class SVHNSoftmaxMetrics:

  batch_size, num_chars, num_classes = predictions.shape
  print("calc_loss:The number of characters is {0}".format(num_chars))
  print("calc_loss:The number of timesteps is {0}".format(self.num_timesteps))

The reasonable result may be :

    │calc_loss:The number of characters is 2
    │calc_loss:The number of timesteps is 1

Now they are not matched, and I see the hint of the error, 200! = 400, so I think the shape of the prediction may have some problem. So I see the source code of __call__ method of the SVHNRecognitionNet class in the svhn.py ,it shows:

        # for each timestep of the localization net do the 'classification'
        h = F.reshape(h, (self.num_timesteps, -1, self.fc1.out_size))
        overall_predictions = []
        # num_timesteps
        for timestep in F.separate(h, axis=0):
            ....
            # predict the label separately
            for _ in range(self.num_labels):
            ....
            # concat the predictions finally
            final_lstm_predictions = F.concat(final_lstm_predictions, axis=0)
            # append to the overall list
            overall_predictions.append(final_lstm_predictions)

The code concat on the axis=0. So the shape of the entry in the list overall_predictions maybe like [num_chars, batch_szie, num_classes]. The number(or the length of the list) of entry in the list is num_timesteps.

And in the calc_loss method of the SVHNSoftmaxMetrics class in the svhn_softmax_metrics.py, the code is like:

    def calc_loss(self, x, t):
        # batch_predictions is a list of the predicted results
        # dim of the entry in the list is like [num_chars, batch_size, num_classes]
        batch_predictions, _, _ = x

        # concat all individual predictions and slice for each time step

        # in fact : [num of chars, batch_size, num_timesteps, num_classes]
        batch_predictions = F.concat([F.expand_dims(p, axis=2) for p in batch_predictions], axis=2)

The code expand_dims and concat on the axis=2, so the final dims of the batch_predictions may be like [num of chars, batch_size, num_timesteps, num_classes]. But according to the context, the expected dim of the batch_predictions may be [num_timesteps, bath_size, num_chars, num_classes]. So I add one line:

# expectation : [num_timesteps, bath_size, num_chars, num_classes]
batch_predictions = F.transpose(batch_predictions, axes = (2, 1, 0, 3))

And it works, no error. I train a nice model.

When the first line of the csv file is like a a, the two number is same, it is all right.

Do this is really a bug?

Finally, thanks for your work ! It is a good paper.

Bartzi commented 6 years ago

Yes, you are right. That is a bug :sweat_smile:. I've fixed it in the code. Thanks for pointing it out!

What do you mean with

the two number is same, it is all right

? Did you set the first line to 2 2 or 1 1 and it worked?

mzhaoshuai commented 6 years ago

Before the bug is fixed, the code can run successfully when the first line in the csv file hvae two same numbers. It means the number of the regions is same as the max number of the characters. This is an interesting coincidence. When num_chars = num_timesteps, the sequence do not matter.

This is waht I want to express. Please pardon me for the unclear expression. I am not a native English speaker...so, you know, it is hard to express properly.

Thanks for your work again!

Bartzi commented 6 years ago

Ah, okay now I get it :smile:! Thanks again for pointing out the bug!