Closed kbrajwani closed 3 years ago
Why you need to detect space? I think it has no meaning in this task. For colon, you can refer to small_satrn.
space because sometimes i am getting roi of word with one char like word w . so when i run your model it will give me wordw so how can i make this correct.
Sorry, I don't understand what you mean. Please give me some examples using image.
so like i am getting image INSURER A: and model predict insurera
small_satrn by this you mean i have to train new model with sensitive is equal to true. your u have case sensitive model so i can change parameter and it will work.
I think space is a pseudo-proposition in this task. How do you define space in a image? For example, if you resize a image to a bigger size, how many spaces it will have?
ok i don't have a whole idea but you can refer this https://github.com/clovaai/deep-text-recognition-benchmark/issues/97 also https://github.com/zzzDavid/ICDAR-2019-SROIE/tree/master/task2 in this they are saying "We add the blank space between words to the alphabet for LSTM prediction and thus improve the network from single word recognition to multiple words recognition" so i think it can be done.
If it possible, you can add it to the character set.
small_satrn by this you mean i have to train new model with sensitive is equal to true. your u have case sensitive model so i can change parameter and it will work.
what about this?
This model is trained without space. If you want to recognize space, you have to train or finetune the model.
@ChaseMonsterAway I tried adding space in the character set and changed it in following way ('--character', default='0123456789,.:(%$!^&-/);<~|`>?+=_[]{}"\'@#*ABCDEFGHIJKLMNOPQRSTUVWXYZ\ ') but still the code is not considering the space.
@Aishwarya-0606 Can you make sure that the way you add space is right? Could you please show me your config file?
@ChaseMonsterAway there is no config file as such. I have put the required character list and space in train.py file. I have specified the space at the end with prefix symbol \ . PFA train.py file train.txt
parser.addargument('--character', type=str, default='0123456789,.:(%$!^&-/);<~|`>?+=[]{}"\'@#*ABCDEFGHIJKLMNOPQRSTUVWXYZ\ ', help='character label')
@Aishwarya-0606 The code you used is deep-text-recognition-benchmark, not vedastr. And what is dataset you used? I'm not sure the meaning 'not considering the space', encoding step? or performance not changed? And here is the encoding result by vedastr: The space is encoded succesfully.
@ChaseMonsterAway thanks for the quick reply.
I am using ICDAR 2019 SROIE dataset. By 'not considering the space' - I mean encoding step. I have given 51,000 images for training but training is happening on 26,000 images only as the code is filtering the images containing spaces.
I'm not sure what is the reason. You should check the filter step in dataset of their repo. It has no relation with encoding, they filter the samples in dataset phase. By the way, you can try our repo, i have tested that the space will encoded.
@Aishwarya-0606 Have you enlarged the batch_max_length?
@Aishwarya-0606 You should check the character. I think it has some problem.
@Aishwarya-0606 Have you enlarged the batch_max_length?
Yes I have enlarged batch_max_length to 75, according to my dataset.
I'm not sure what is the reason. You should check the filter step in dataset of their repo. It has no relation with encoding, they filter the samples in dataset phase. By the way, you can try our repo, i have tested that the space will encoded.
@ChaseMonsterAway I will try implementing this using your repo. Thanks :)
@Aishwarya-0606 You should check the character. I think it has some problem.
@ChaseMonsterAway All the characters(including brackets) are getting trained. I am getting the below Trainable parameter list, but space is not included in this list.
character: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[]^_`{|}~ sensitive: True
@Aishwarya-0606 Have you enlarged the batch_max_length?
Yes I have enlarged batch_max_length to 75, according to my dataset.
Including space? Then it will not filter by length, right? And i checked your character, it does no effect. For me,
import re
character = '0123456789,.:(%$!^&-/);<~|`>?+=_[]{}"\'@#*ABCDEFGHIJKLMNOPQRSTUVWXYZ\ '
re.search(f'[^{character}]', ' 我') # has no output, this is caused by bracket in character '[]', you should check the grammar of regex.
# change the character as follows
character = '0123456789,.:(%$!^&-/);<~`>?+=_[\\]{|}"\'@#*ABCDEFGHIJKLMNOPQRSTUVWXYZ\ '
re.search(f'[^{character}]', ' 我') # the output is 我
@Aishwarya-0606 Have you enlarged the batch_max_length?
Yes I have enlarged batch_max_length to 75, according to my dataset.
Including space? Then it will not filter by length, right? And i checked your character, it does no effect. For me,
import re character = '0123456789,.:(%$!^&-/);<~|`>?+=_[]{}"\'@#*ABCDEFGHIJKLMNOPQRSTUVWXYZ\ ' re.search(f'[^{character}]', ' 我') # has no output, this is caused by bracket in character '[]', you should check the grammar of regex. # change the character as follows character = '0123456789,.:(%$!^&-/);<~`>?+=_[\\]{|}"\'@#*ABCDEFGHIJKLMNOPQRSTUVWXYZ\ ' re.search(f'[^{character}]', ' 我') # the output is 我
Yes, it will not filter by length. And also, I changed the character list as suggested by you. But still the model is not getting trained for space.
@Aishwarya-0606 I mean it will have no effect, which means it will not filter samples. You can debug the dataset with a few samples.
@ChaseMonsterAway I tried debugging the dataset. It is filtering all the images having spaces.
@Aishwarya-0606 Where's the code segment?
log_dataset.txt @ChaseMonsterAway for debugging I took 10 images for training out of which 4 images have spaces. So for training it filtered out those 4 images and considered only 6 images. PFA the log file.
@Aishwarya-0606 I mean which code filter these samples, have you find it?
@Aishwarya-0606 I mean which code filter these samples, have you find it? @ChaseMonsterAway No, I am still looking for this code segment.
@Aishwarya-0606 I mean which code filter these samples, have you find it?
@ChaseMonsterAway Now I am able to train my model on the entire dataset having spaces, (i.e. on all 51,000 images). But the issue now is with the label, code is removing the spaces in the label (eg: 'the label' ----is changed to----> 'thelabel'). I tried debugging the code. Label in getting changed in dataset.py file.
85 for i, data_loader_iter in enumerate(self.dataloader_iter_list): 86 try: 87 image, text = data_loader_iter.next() 88 print(text) 89 balanced_batch_images.append(image) 90 balanced_batch_texts += text
CASHCHANGE (it should be - cash change) TOTALINCLUSIVEGST (it should be - total inclusive gst)
@aish0606 did you solved this issue? Could you please share your solution?
Hey can your model able to detect space and colon or if not how can i make it possible.