Open rezha130 opened 6 years ago
@rezha130 I think your problem is this line, you should exchange 52
by 72
. You char_map is different to the one I've been using. This problem could be fixed in the same way as done with PR #41.
@Bartzi thanks for quick reply, now i can test my model result.
But another problem appear:
python text_recognition_demo.py mytrain model_42000.npz mytrain/image/00001.jpg mytrain/ctc_char_map.json --gpu 0
give this result :
OrderedDict([('Numbers',
[OrderedDict([('top_left', (10.153651237487793, 0.0)),
('bottom_right', (188.42269897460938, 64.0))]),
OrderedDict([('top_left', (9.257012367248535, 0.0)),
('bottom_right', (188.95077514648438, 64.0))]),
OrderedDict([('top_left', (9.751701354980469, 0.0)),
('bottom_right', (189.06959533691406, 64.0))]),
OrderedDict([('top_left', (16.02237892150879, 0.0)),
('bottom_right', (188.70294189453125, 64.0))]),
OrderedDict([('top_left', (23.43842315673828, 0.0)),
('bottom_right', (188.17893981933594, 64.0))]),
OrderedDict([('top_left', (30.188858032226562, 0.0)),
('bottom_right', (187.6661376953125, 64.0))]),
OrderedDict([('top_left', (35.84349822998047, 0.0)),
('bottom_right', (187.2195281982422, 64.0))]),
OrderedDict([('top_left', (40.32756805419922, 0.0)),
('bottom_right', (186.85638427734375, 64.0))]),
OrderedDict([('top_left', (43.758575439453125, 0.0)),
('bottom_right', (186.5736083984375, 64.0))]),
OrderedDict([('top_left', (46.3254280090332, 0.0)),
('bottom_right', (186.35931396484375, 64.0))]),
OrderedDict([('top_left', (48.2197265625, 0.0)),
('bottom_right', (186.19967651367188, 64.0))]),
OrderedDict([('top_left', (49.60652542114258, 0.0)),
('bottom_right', (186.08200073242188, 64.0))]),
OrderedDict([('top_left', (50.614906311035156, 0.0)),
('bottom_right', (185.99632263183594, 64.0))]),
OrderedDict([('top_left', (51.347171783447266, 0.0)),
('bottom_right', (185.93399047851562, 64.0))]),
OrderedDict([('top_left', (51.879066467285156, 0.0)),
('bottom_right', (185.8885955810547, 64.0))])])])
I expect have more words rather than first word.
Did you check those 2 lines? And adjust them to your case?
Your groundtruth is not necessary for using the demo script, but it looks okay to me.
Your problem is that you are using a script that is designed for printing only one word.
I'm not 100% sure but I think that this line, could be the solution. Remove the [0]
.
@Bartzi
I remove [0]
and get this error:
Traceback (most recent call last):
File "text_recognition_demo.py", line 181, in <module>
word = "".join(map(lambda x: chr(char_map[str(x)]), word))
File "text_recognition_ktp.py", line 181, in <lambda>
word = "".join(map(lambda x: chr(char_map[str(x)]), word))
KeyError: '[33 28 30]'
@Bartzi Can we create word based ground thuth file? As @rezha130 has mentioned, till now I have been following the csv structure data, is not that the only way for showing ground thruth to the network?
@rezha130 I am really confused, can you tell me the step you followed to build your custom dataset? would be very grateful
If you use train_text_recognition
you can use word based ground truth file... oops yeah that is a little different to the other scripts... hmm I'm sorry for that...
some differences with time_step = 15
and max_char = 16
from my previous train are in these lines at my create_train.py
script :
max_bound_box = "15"
max_chars = "16"
for row in result:
image_name = row[0]
label = row[1]
file.write(os.path.join(train_dir,image_name)+"\t"+ label.replace(" ","\t") +"\n")
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz:()[];&+-/'.,0123456789"
without white space
I remove [0] and get this error:
Traceback (most recent call last): File "text_recognition_demo.py", line 181, in
word = "".join(map(lambda x: chr(char_map[str(x)]), word)) File "text_recognition_ktp.py", line 181, in word = "".join(map(lambda x: chr(char_map[str(x)]), word)) KeyError: '[33 28 30]'
You did expect this, didn't you? Once you remove [0]
you of course will get an iterable where there was none before... so you'll need to add a loop to the code.
Yes @Bartzi, you're right. But i'm pretty sure that model only predict the first word only and neglect all next sequence words (word based gt & tab delimited). It can be shown on rendered bbox images in log/boxes folder, the result is just one first word.
Since you said that train_text_recognition
script is designed for recognize only one word, so i try to adjust my custom ground truth files & train approaches with that constraint. Now, rendered bbox images show that model learnt to recognize all defined char_map in image --fyi, first word is like title for specific data values, it always repetitive in every train image set..so model can predict it easily--, but it looks line train process need longer epochs to improve. I can wait for that, since loss score tend to decrease slowly..
First thing I see is that the predicted bboxes don't look good at all. They should change positions after a while see the text recognition video from this file.
Furthermore, did you have a close look at the implementation of the dataset loader (here)?
Delimiting with tab
does not make sense. Sorry, if I misunderstood one of your posts regarding the layout of your groundtruth file.
If you struggle with the groundtruth format, you can also create your own dataset loader!
The only thing you need to make sure is that it returns the right data and is a subclass of the DatasetMixin
.
The expected return value is a tuple with the loaded image and the label converted from characters to classes, using the char_map. Remember to pad each word according to your maximum of characters per word.
OK @Bartzi . I think my ground truth file still not correct yet for multi words detector after i check TextRecFileDataset
.
Can you please send example of ground truth file that you use for the videos, ecspecially ground truth gt_word.csv
files --with example for how to write num_timesteps
, num_labels
, file_name
& labels
-- that you used in Text Recognition.mp4 (one word) and FSNS.mp4 (max two words & max three words) using TextRecFileDataset
function.
My case basically same with FSNS (detect 2 or 3 text region, than recognize chars in every detected bounding box)
Thank you
okay,
TextRecFileDataset
is not used for training FSNS data.23 1
/data/text_recognition/samples/9999/9999026_]kinkiness_-5_DonegalOne-Regular.jpeg ]kinkiness
3 21
/mnt/ssd/christian/data/fsns/images/train/00000/0.png 67 12 11 1 5 26 20 21 23 0 0 0 0 0 0 0 0 0 0 0 0 23 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 11 7 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
You should have a look at the FSNS examples, train_text_recognition
does some things differently!
Hi @Bartzi
Now i train with train_fsns.py but i got this error
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What does it means?
hmm, hard to say without the stack trace.
But it basically says, that there are some arrays that are concatenated that do not have the correct shape. Could be because of your input data. Did you make sure to input an image that has this dimensions: 600x150
?
Hi @Bartzi
Input images size is not fixed in train data set. I am using same images data set when train using train_text_recognition
which didn't result this kind of error message, but it run until last epoch.
This is my ground truth file with FSNS style
2 16
mytrain/images/0001.jpg 13 11 12 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 8 3 1 4 7 4 1 3 9 10 1 1 1 7
and char_map.json
{
"0": 9250,
"1": 48,
"2": 49,
"3": 50,
"4": 51,
"5": 52,
"6": 53,
"7": 54,
"8": 55,
"9": 56,
"10": 57,
"11": 73,
"12": 75,
"13": 78
}
i train with this command
python train_fsns.py curriculum.json log \
--blank-label 0 \
--batch-size 32 \
--is-trainer-snapshot \
--use-dropout \
--char-map char_map.json \
--gpu 0 \
--snapshot-interval 1000 \
--dropout-ratio 0.2 \
--epoch 100
please check full stack trace of error below:
/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:150: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Exception in main training loop: all the input array dimensions except for the concatenation axis must match exactly
Traceback (most recent call last):
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 229, in update_core
batch = self.converter(batch, self._devices[0])
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 133, in concat_examples
[example[i] for example in batch], padding[i])))
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 163, in _concat_arrays
return xp.concatenate([array[None] for array in arrays])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_fsns.py", line 292, in <module>
trainer.run()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/rezha/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 229, in update_core
batch = self.converter(batch, self._devices[0])
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 133, in concat_examples
[example[i] for example in batch], padding[i])))
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/dataset/convert.py", line 163, in _concat_arrays
return xp.concatenate([array[None] for array in arrays])
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Input images size is not fixed in train data set.
That does not work, because the network is not fully convolutional and because it is not possible to create a batch out of images with different size. It worked with train_text_recogntion.py
because there the input images are resized prior to being fed to the network.
The FSNS network expects the images to be of shape 600x150
if that is not the shape your data has, you have to adjust the data loading code (and also the network, as your data is likely to be very different to the original FSNS dataset)!
Ok @Bartzi you're right. I resized all my train images to 600x150
pixels.
But now i got IndexError: list index out of range
in calc_loss
:
/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:150: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Exception in main training loop: list index out of range
Traceback (most recent call last):
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_fsns.py", line 292, in <module>
trainer.run()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/rezha/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
IndexError: list index out of range
What happen?
@rezha130 I think, this is becuase you have more than 3 timesteps in your training set?
Hi @mit456 thanks for helping
Yes, previousIndexError: list index out of range
in calc_loss
happen when i am using this ground truth file based on FSNS style
6 22
mytrain/images/01179.jpg 18 31 43 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 25 22 13 29 5 18 24 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Is there any maximum limitation of times step
& num_labels
in using FSNS-like experiment?
@Bartzi & @mit456
Can i used my custom my_char_map.json
for FSNS-like train in my custom train data set?Or i must used fsns_char_map.json
which is already provided?!
@rezha130 Before you resized your images to 600x150
, did you check that they have the same semantics as the images of the FSNS dataset? This is important!!
Forget about the loss_weights
in loss_metrics.py
they are not useful for your training. I just used them to make it possible to put some emphasis on certain timesteps of the optimization.
Technically there is no limit for num_timesteps
and num_labels
. You can of course use your custom char_map
, but you will need to adapt this line, and change the number of classes you want to distinguish.
After i add label_size
as parameter in self.classifier = L.Linear(None, label_size)
, model can be train.
num_timesteps = 2
num_labels = 16
main/accuracy = 0.5
until last epoch (100 epochs)
@Bartzi ..something strange in bounding box result. Whats happen?
Hi @Bartzi
As mention previously, I add label_size
as parameter in self.classifier = L.Linear(None, label_size)
, so model can be train. But if only num_timesteps
is 2 or 3!
I'm using this script to get label_size
with open(args.char_map, 'r') as fp:
char_map = json.load(fp)
label_size = len(char_map)
But if i try to train with another custom training data set which have num_timesteps
more than 3, i still got this error:
/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:150: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Exception in main training loop: list index out of range
Traceback (most recent call last):
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_fsns.py", line 292, in <module>
trainer.run()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 320, in run
six.reraise(*sys.exc_info())
File "/home/rezha/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 306, in run
update()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 149, in update
self.update_core()
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 231, in update_core
loss = _calc_loss(self._master, batch)
File "/home/rezha/miniconda3/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 262, in _calc_loss
return model(*in_arrays)
File "/home/rezha/dataloka/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
self.loss = self.lossfun(self.y, t)
File "/home/rezha/dataloka/see/chainer/metrics/loss_metrics.py", line 211, in calc_loss
overall_loss_weight = loss_weights[i - 1]
IndexError: list index out of range
@mit456 do you have some same error experience with num_timesteps
more than 3?
@Bartzi please help..
I told you to delete the loss_weights
, this will fix your problem, but it still won't help you that much.
I ask again, did you have a look at the FSNS dataset? Did you see that there are always 4 views of the same street name sign that are shown at the same time? Your data does not have this property, so you can not use this script without modifications! The fact that your predictions already look quite good, is in my point of view a hint that the network memorizes your data and your data has view variations making it easy for the recognition network to memorize (i.e. overfit)
Ok @Bartzi would you please give me the list of py files that i need to modified? At least, i can focus on debugging some of your script, not all of your py files
First, you will need to change the network definition and the way predictions are made (1, 2). You will also need to change the way metrics/loss are calculated (1, 2). Furhtermore, you will need to think about, whether you want to use curriculum learning or not, and if you want to plot the current state of the network for each iteration (if you want to do this you might need to make changes in the bbox plotter to, or look whether there is one that is already able to work with your way of making predictions and your way of training).
OK @Bartzi
That 4 py file that i will try to modify: 2 files for network definition + 2 files for loss/metric calculation. I need to modify train
py script also for that.
Now for bbox plotter: What script on your code if i just want view on SINGLE image with max of 2 or 3 or more than 4 words/ timestep
? So i will got plot like this (screenshot from your video), NOT 4 views of the same street name sign that are shown at the same time (in FSNS images) :
Sounds good so far :sweat_smile:. You could have a look at all the bbox plotters here, you will see that all special classes inherit from the class BBOXPlotter
. A good example could be the SVHN BBOXPlotter.
Hi @Bartzi
If i try to set for calc_loss
loss_weights = [1, 1.25, 2, 1.25, 1, 1.25, 2, 1.25, 1, 1.25, 2, 1.25, 1, 1.25, 2, 1.25]
#16 initial losss weights for max 14 timestep
just specific for the longest semantic of my custom training dataset with max num_timesteps = 14
. Am i correct? How do you adjust loss_weights
values?
And one more thing, would you please explain what is different objective between image_size
& target_shape
in your train
script? Why for recognition network & BBox Plotter using target_shape
, but for loss metric calculation using image_size
?
Is it ok if i set it with same value? (also using image_size
for resizing image)
image_size = Size(width=200, height=40)
target_shape = Size(width=200, height=40)
Btw, for bbox plotter..i'm just using the basic one:
bbox_plotter_class = BBOXPlotter
I suggest, that you just delete loss_weights
from the code, they might come in handy if you need to get mroe accuracy out of the model.
The difference between image_size
and target_shape
is the following:
image_size
: this is the size of the input image for the localization networktarget_shape
: this is the size of the input image for the recognition networkI hope this also answers your question, why at one place one value is used and somewhere else another value. So it is not advisable to set them to the same value.
@Bartzi now i can train with variety of my custom data set after some modifications in train & inference script, no error. I modified from FSNS examples, but set args.is_original_fsns = False
, and loss_weight
deleted.
Howefer still the recognition result not as good as expected. As example on this bbox image evaluation result from last epoch, the recognition result is look good:
BBox look not so good where i used standard bbox_plotter_class = BBOXPlotter
, but log
look impressive at last epoch:
{
"main/loss": 0.28168749809265137,
"main/accuracy": 0.9703125,
"validation/main/loss": 0.25905805826187134,
"validation/main/accuracy": 0.9751243781094526,
"lr": 9.999999999987483e-05,
"epoch": 400,
"iteration": 26700,
"elapsed_time": 50062.12740638801
}
but when i try inference model on same image above, i got recognition result: NIK 3175610990006
---only have 13 numeric chars -- which is different from bbox text (total 16 numeric chars). I try inference on different images, always get 13 numeric chars. I set num_labels = 16
when do training. Please help me on this.
And also, how can i set target_shape
for recognition network input? As example, i set this input size for image above..is it correct? if not correct yet, what is the correct size for target_shape
?
image_size = Size(width=200, height=40)
target_shape = Size(width=120, height=30)
I set timestep = 2
because i want 2 bbox for left sentence and right numeric sequence.
Please correct me if i'm wrong.
I think its working very well for you because your dataset is too easy. You said tghat each transcription starts with the same characters NIK
. It is verye asy for the network to memorize this, hence it does not need to locate these characters in order to predict them correctly.
The same could be true for your numbers, I think if you'd increase the number of train images and add more variety, the network won't be able to memorize the numbers, just based on some easy features that have been extracted by the network. You could also try to decrease the capacity of your network (i.e. use a network with less parameters).
For your inference problem: Did you check whether the network predicts the correct number of labels, while the blank tokens are not stripped out, yet?
Your target_shape
looks good. NUmber of timesteps also seems to be reasonable.
Thanks @Bartzi . I checked again my inference script. I found that there is a mistake on me. Now, i got NIK plus 16 numeric character result in inference result.
But i still curious, how i can draw BBox image correctly for evaluation purpose?
As I said, the problem is, that the network is able to memorize all data based on easy to extract features, that's why your network does not learn to localize the characters because it is lazy and does not need to. Think of a human that can do a task very easily, but does not it the way you, still he succeeds, you'll likely have to make the task harder for him!
I think that this is the reason, why the BBoxes are not on the characters.
@Bartzi
Maybe network is relatively easy to memorize NIK
word, but i don't think with next sequence of 16 numeric characters. The train data set is 1300 images (is it enough?), where every image is unique value of sequence number. Those images is ID number, where there're no duplicate ID number value in every record of ground truth file.
FYI, i try also another deep learning algorithm like CRNN --Convolutional Recurrent Neural Net, also with CTC Loss but without STN grid. CRNN network running well in recognition when there are some identical ground truth values for different images. But CRNN was very hard to converge --high loss, near zero accuracy in thousand epoch, even if i try many optimizer algorithm & learning rate value options-- when ground truth value is unique in every record of training data set. SEE network better than CRNN for this case.
PS: CRNN easier to use, because we don't need to adjust additional hyperparameter like input size of recognition network after localization, numbers of maximum localization bbox or numbers of maximum characters per word (even CRNN's network capacity still can not handle more than 26 characters yet)
If you take a closer look at the network architecture of SEE, you will see (no pun intended^^) that the network will only achieve good results if, and only if the localization network is able to provide enough information for the recognition network to succeed.
Now take a closer look at the image you provided some posts ago. We can see that the localization network did not localize NIK
(the blue bbox), because everythin starts with NIK
, very easy for the network to learn. The second localization spans some of the numbers, it seems that this information is enough for the network to correctly identify the rest of the number sequence. This shows us that the task is too easy for the network, mostly because your training set is not large enough (did you try to generate similar looking images with unique sequences?) or the network as such has too many parameters and hence the network is highly overfitting to your data.
Hi @Bartzi
I already successfully train my custom data set (loss score below 0.01) with this command until last epoch:
then i copy all result files from log to
mytrain
folder.But when i try specific npz model file with this command:
python text_recognition_demo.py mytrain model_42000.npz mytrain/image/00001.jpg mytrain/ctc_char_map.json --gpu 0
that command failed on load weights file
FYI, before training process..i upgraded to chainer 4.2.0 for enabling cuDNN with cupy-cuda90 4.2.0. Is that a problem?
Please help.