Problem in evaluation after training the model

Hi, I have trained the model on SVHN dataset. I have trained the model on google colab notebook. After training the model the following directory created.

NOTE: I have trained the model with batch size 8 because of the memory limitations in google colab.

--->2018-04-12T05_08_15.296708_training
            |-------------->cg.dot
            |-------------->log
            |-------------->model_20000.npz
            |-------------->model_40000.npz
            |-------------->train_svhn.py
            |-------------->trainer_snapshot

But after running the evaluate.py with the below command it is looking for svhn.py in the model directory and gives the error. But the created file is train_svhn.py

python SEE_unziped/SEE/see-master/chainer/evaluate.py --gpu 0 'svhn' "/content/SEE_unziped/SEE/2018-04-12T05:08:15.296708_training" "trainer_snapshot" "/content/SEE_unziped/SEE/test.tar/test/test/cropped/test.csv" "/content/SEE_unziped/SEE/see-master/datasets/svhn/svhn_char_map.json" 2

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "SEE_unziped/SEE/see-master/chainer/evaluate.py", line 37, in 
    evaluator = args.evaluator(args)
  File "/content/SEE_unziped/SEE/see-master/chainer/evaluation/evaluator.py", line 39, in __init__
    module = self.load_module(os.path.abspath(os.path.join(args.model_dir, localization_module_name)))
  File "/content/SEE_unziped/SEE/see-master/chainer/evaluation/evaluator.py", line 134, in load_module
    module_spec.loader.exec_module(module)
  File "", line 674, in exec_module
  File "", line 780, in get_code
  File "", line 832, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/content/SEE_unziped/SEE/2018-04-12T05:08:15.296708_training/svhn.py'

So I renamed the file from train_svhn.py to svhn.py. After renaming the file the above error gone but this new error is coming.

python SEE_unziped/SEE/see-master/chainer/evaluate.py --gpu 0 'svhn' "/content/SEE_unziped/SEE/2018-04-12T05:08:15.296708_training" "trainer_snapshot" "/content/SEE_unziped/SEE/test.tar/test/test/cropped/test.csv" "/content/SEE_unziped/SEE/see-master/datasets/svhn/svhn_char_map.json" 2

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "SEE_unziped/SEE/see-master/chainer/evaluate.py", line 37, in 
    evaluator = args.evaluator(args)
  File "/content/SEE_unziped/SEE/see-master/chainer/evaluation/evaluator.py", line 58, in __init__
    chainer.serializers.NpzDeserializer(f).load(self.net)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializer.py", line 83, in load
    obj.serialize(self)
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 817, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 817, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 817, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 566, in serialize
    data = serializer(name, param.data)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 143, in __call__
    dataset = self.npz[key]
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py", line 239, in __getitem__
    raise KeyError("%s is not a file in the archive" % key)
KeyError: 'localization_net/lstm/lateral/W is not a file in the archive'

This is interesting. The code should actually create a file called svhn.py in your log folder. This file is just a copy of the file svhn.py in the folder models. This is done to ensure that the correct model definition is available while evaluating the model. So in order to fix your first problem you can just copy the file svhn.py fro that directry to your log_dir. I'm not sure why it is not working for you, but you could try to debug the creation of the Logger object in utils/train_utils.py.

The second error you get is because you are trying to load the wrong model. Instead of trainer_snapshot you should use model_20000.npz or model_40000.npz.

Thanks for the quick reply @Bartzi . As you instructed the above problem solved. But the following error is coming. This error might be of chainer.

python SEE_unziped/SEE/see-master/chainer/evaluate.py --gpu 0 'svhn' "/content/SEE_unziped/SEE/2018-04-12T05:08:15.296708_training" "trainer_snapshot" "/content/SEE_unziped/SEE/test.tar/test/test/cropped/test.csv" "/content/SEE_unziped/SEE/see-master/datasets/svhn/svhn_char_map.json" 2

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
  0%|                                                 | 0/13068 [00:00
    evaluator.evaluate()
  File "/content/SEE_unziped/SEE/see-master/chainer/evaluation/evaluator.py", line 114, in evaluate
    predictions, crops, grids = self.net(image[self.xp.newaxis, ...])
  File "/content/SEE_unziped/SEE/see-master/chainer/models/svhn.py", line 209, in __call__
    h = self.localization_net(images)
  File "/content/SEE_unziped/SEE/see-master/chainer/models/svhn.py", line 60, in __call__
    lstm_prediction = F.relu(self.lstm(in_feature))
  File "/usr/local/lib/python3.6/dist-packages/chainer/links/connection/lstm.py", line 309, in __call__
    lstm_in = self.upward(x)
  File "/usr/local/lib/python3.6/dist-packages/chainer/links/connection/linear.py", line 129, in __call__
    return linear.linear(x, self.W, self.b)
  File "/usr/local/lib/python3.6/dist-packages/chainer/functions/connection/linear.py", line 234, in linear
    y, = LinearFunction().apply(args)
  File "/usr/local/lib/python3.6/dist-packages/chainer/function_node.py", line 240, in apply
    self._check_data_type_forward(in_data)
  File "/usr/local/lib/python3.6/dist-packages/chainer/function_node.py", line 321, in _check_data_type_forward
    self.check_type_forward(in_type)
  File "/usr/local/lib/python3.6/dist-packages/chainer/functions/connection/linear.py", line 23, in check_type_forward
    x_type.shape[1] == w_type.shape[1],
  File "/usr/local/lib/python3.6/dist-packages/chainer/utils/type_check.py", line 524, in expect
    expr.expect()
  File "/usr/local/lib/python3.6/dist-packages/chainer/utils/type_check.py", line 482, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
chainer.utils.type_check.InvalidType: 
Invalid operation is performed in: LinearFunction (Forward)

Expect: in_types[0].shape[1] == in_types[1].shape[1]
Actual: 192 != 5808

I need to clear one thing what is num_labels signifies? Is it the number of max number of digits in house number detected in the image. What will be the good value for it. or we can give any value?

I think you receive this error, because the dimensions of your input images are not correct, at least they are not resized to the size expected by the model. You could try to check that.

Regarding num_labels. You are right it tells the network for many digits it should search at maximum. A good value is to have the same value as in training. You specified this value in your train groundtruth file, in the first line. You should use this value.

Thanx @Bartzi the evaluation is executing fine. But every time it is predicting label = 1 for any digit. And for all the image I am getting only 1 label. e.g. for 210 in image I am getting word = 1 and gt_word = 2. Is it because I have taken --timesteps 1. What --timesteps signifies??

--timesteps tells the localization network how many text regions you expect to be in the image at max. Increasing ths number should give you longer predictions.

But I am getting word = 1 for every image. I have taken --timesteps = 1.

Hi @Bartzi Sorry for a lot of issues. But I have trained the SVHN and there is problem in evaluation because of the issue of the parameters. Can u please share the parameters value you have choosen for training and evaluation of the SVHN dataset. It will be a great favor. Thanks

Sure no problem, but you already said that you trained a SVHN model. Could you post the command-line you used to train the model and also the log file from the log folder? With that information I can help you to find the correct parameters.

Hi @Bartzi I followed the readme and try to build a SVHN model. I downloaded the datasets here and train the model using the datasets in the generated/easy folder. But when I was going to evaluate the model I trained using the datasets in the evaluation/test folder, I encountered the same problem mentioned above.Which parts of the error message is Invalid operation is performed in: LinearFunction (Forward)

Expect: in_types[0].shape[1] == in_types[1].shape[1] Actual: 192 != 5808 I saw your comment said that maybe it's because the dimensions of the input images are not correct, and I should check it. Sorry for the dumb question... but how do I know the size the model expects? Where can I get the information? I checked the log file in the folder "2018-08-31T13:42:20.124446_training" (which is generated when the model is trained) . It shows that "image_size": [ 200, 200 ], "target_size": [ 50, 50 ], But I tried to resize the input images to either 200x200 or 50x50 before running evaluate.py, and still got the same error message. So maybe they are not the correct sizes either. I stuck here for many days :( Sorry for bothering you. Please help me with my question.Thank you very much~

Hmm, okay...

could you provide the exact calls you used to start the train and evaluation scripts? Maybe I can use that to help you with your problem =)

My calls for training is python train_svhn.py --char-map ../datasets/svhn_char_map.json --blank-label 10 -b 50 -g 0 --timesteps 4 specify.json ../datasets And call for evaluating is python ../../chainer/evaluate.py --timesteps 1 --gpu 0 svhn ../2018-08-31T13\:42\:20.124446_training/ model_20000.npz test.csv ../svhn_char_map.json 3

Hello @Bartzi @saq1410 , sorry to bother both of you. I met a problem that i can't understand when i try to evaluate svhn model.There are some detials about the question as follows:

the model i gained was 'model_20000.npz ', the training command is: python3 train_svhn.py curriculum.json ../logs --char-map ../datasets/svhn/svhn_char_map.json -b 4 -lr 0.0001 --blank-label 10 -g 0 1 and the content of curriculum.json is: [ { "train":"../svhn_dataset_and_models/generated/easy/train.csv", "validation":"../svhn_dataset_and_models/generated/easy/valid.csv" } ] the dataset i used is #6.
the training step goes well, but when i try to evalute the svhn model as you said in the Evaluation part Readme file and i get the Error as bellow: 0%|| 0/12920 [00:00<?, ?it/s]data value is ['5', '0', '0'] Traceback (most recent call last): File "evaluate.py", line 38, in <module> evaluator.evaluate() File "/app/chainer/evaluation/evaluator.py", line 109, in evaluate labels = self.prepare_label(line[1:]) File "/app/chainer/evaluation/evaluator.py", line 151, in prepare_label return data.reshape((-1, self.args.num_labels)) AttributeError: 'list' object has no attribute 'reshape' the running command i used is: python3 evaluate.py svhn ../logs/2018-09-11T04\:50\:36.443108_training/ model_20000.npz ../svhn_dataset_and_models/evaluation/test.csv ../svhn_dataset_and_models/svhn_char_map.json --target-shape 64,64 2 --gpu 0 what's more, the test dataset i used is evaluation part from svhn_dataset_and_models.zip that i download from here also. Why i got the 'reshape' error curiously? Appreciative for your reply.

@wyh410 you are using --timesteps 1 while evaluating, but you trained with --timesteps 4 this could be the problem, I guess.

@Jacoobr hmm, looks like you changed something in the evaluation code? In your code the wrong evaluator is called. You seem to be using the standard evaluator instead of the SVHNEvaluator although this should be used automatically.

Hi, @Bartzi , thanks for your reply . I changed nothing in the evaluation code. Now, I try to figure out which evaluator called when i run the evaluation command as I said and the output the value of evaluator of line 73 in the script evaluate.py is: <evaluation.evaluator.SVHNEvaluator object at 0x7fca3c197160> so, I guess i'm using the SVHNEvaluator . when the SVHNEvaluator object called evaluate function(the line code of 38 in the script evaluate.py to program goes to the code of line 104 in the script evaluator.py , so the error occured. Why SVHNEvaluator actually not be used, I think, when i run the evaluation command (python3 evaluate.py svhn ../logs/2018-09-11T04\:50\:36.443108_training/ model_20000.npz ../svhn_dataset_and_models/evaluation/test.csv ../svhn_dataset_and_models/svhn_char_map.json --target-shape 64,64 2 --gpu 0) ? how can i solve the 'reshape' problem?

That's very interesting. You seem to have code that is different to the code on github, because the line numbers did not match, as I was checking the code. If you have a look at the error, i clearly tells you that a list is not a numpy array. But this list should be a numpy array once this line of code is reached (see the code for that).

That's all I can tell you right now.

hello, @Bartzi , thanks for your reply. I go through the code and find the data variable (line 150 code) un changed when the line code is executed (code ) , so i changed the code of line 149 like this: data = self.xp.array(data, dtype=self.xp.int32) then the 'reshap' error was gone. But unfortunately, i get another error like this, there are the details about it as follow:

Traceback (most recent call last): File "evaluate.py", line 39, in <module> evaluator.evaluate() File "/app/chainer/evaluation/evaluator.py", line 112, in evaluate predictions, crops, grids = self.net(image[self.xp.newaxis, ...]) File "/app/logs/2018-09-11T04:50:36.443108_training/svhn.py", line 209, in __call__ h = self.localization_net(images) File "/app/logs/2018-09-11T04:50:36.443108_training/svhn.py", line 60, in __call__ lstm_prediction = F.relu(self.lstm(in_feature)) File "/usr/local/lib/python3.5/dist-packages/chainer/links/connection/lstm.py", line 309, in __call__ lstm_in = self.upward(x) File "/usr/local/lib/python3.5/dist-packages/chainer/links/connection/linear.py", line 129, in __call__ return linear.linear(x, self.W, self.b) File "/usr/local/lib/python3.5/dist-packages/chainer/functions/connection/linear.py", line 118, in linear y, = LinearFunction().apply(args) File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 230, in apply self._check_data_type_forward(in_data) File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 298, in _check_data_type_forward self.check_type_forward(in_type) File "/usr/local/lib/python3.5/dist-packages/chainer/functions/connection/linear.py", line 20, in check_type_forward x_type.shape[1] == w_type.shape[1], File "/usr/local/lib/python3.5/dist-packages/chainer/utils/type_check.py", line 524, in expect expr.expect() File "/usr/local/lib/python3.5/dist-packages/chainer/utils/type_check.py", line 482, in expect '{0} {1} {2}'.format(left, self.inv, right)) chainer.utils.type_check.InvalidType: Invalid operation is performed in: LinearFunction (Forward) Expect: in_types[0].shape[1] == in_types[1].shape[1] Actual: 192 != 5808

Can you help me with this proble? What's more the num_labels parameter when training the model is 4 (i set at the first line in the gt file), --timesteps parameter is default value 3. And the evaluation command is : python3 evaluate.py svhn ../logs/2018-09-11T04\:50\:36.443108_training/ model_20000.npz ../svhn_dataset_and_models/evaluation/test.csv ../svhn_dataset_and_models/svhn_char_map.json --target-shape 64,64 4 --gpu 0 --timesteps 3 I'm sorry to bother you again, cry.

Hmm, --timesteps should be 4 if you trained on the easy dataset (because these images always contain four different views, so we need to localize four text regions).

Furthermore: Did you change this line? You have to set --target-shape to the value of this line (I know not the best way of doing it, should rather be a command line argument...).

Those changes change the size of the arrays you are using and should thus help you to get the right shape, the code is expecting.

Hi @Bartzi ， i didn't change this line code and change nothing of this script. Now, i set --timestpes 4 parameter and train on easy dataset (i have set 4 4 in the first line of gt file.) Then i still get the same error when i evaluate the svhn model, this error is so curiously, i think. I'm so sorry to bother u and hope for your help. What's more, there are the training command and evaluation command as follow: training command: python3 train_svhn.py curriculum.json ../logs --char-map ../datasets/svhn/svhn_char_map.json -b 8 -lr 0.0001 --blank-label 10 -e 4 --timesteps 4 -g 0 1 evaluation command: python3 evaluate.py svhn ../logs/2018-09-11T04:50:36.443108_training/ model_20000.npz ../svhn_dataset_and_models/evaluation/test.csv ../svhn_dataset_and_models/svhn_char_map.json --target-shape 50,50 4 --gpu 0 --timesteps 4

Hmm, there is one more thing I can think of. While looking at the code, I see that the images for training the network are not automatically resized to be of size 200 x 200 this might be a problem, you can check what image size is used during training and this should be the same during the evaluation as well... As I said the problem, is du to the fact that the sizes of the arrays do not match, but I'm not 100% sure what the root cause of your problem is.

Bartzi / see

Problem in evaluation after training the model #24