text detection - Githubissues

ink1 commented 6 years ago

Hello, I'm looking for open source solutions sufficiently close to the state of the art for text detection in medical images for the purposes of image anonymization. The extracted text could be useful but not essential at this stage. We just want to block out any text in an image with sufficiently high recall and not too bad precision. Do you think your code can be adapted for that?

I'm trying your "FSNS Demo" but it gives me

Traceback (most recent call last):
  File "fsns_demo.py", line 140, in <module>
    network = create_network(args, log_data)
  File "fsns_demo.py", line 60, in create_network
    localization_net = build_localization_net(localization_net_class, args)
  File "fsns_demo.py", line 48, in build_localization_net
    return localization_net_class(args.dropout_ratio, args.timesteps)
TypeError: __init__() missing 2 required positional arguments: 'num_refinement_steps' and 'target_shape'

Any advice on that?

I've tried running a quick test just now following your "Text Recognition Demo" but it seems to return the same number of boxes.

Thanks in advance!

Bartzi commented 6 years ago

Hi,

in the current state I don't think you can use the code in this repository to do text detection in a large scale. We need to do more research in order to be able to do real unconstrained text detection with this/a similar method.

Regarding your error:

You are using the text recognition model with fsns_demo.py, right? This is not the way you should use it. Use the FSNS model.
you will always get the same amount of bboxes, because the system can not determine on its own how many regions of interest are in the image, but it can say that a predicted bbox does not contain interesting information. that is why you will always get the same amount of bboxes. The information will be filtered in the recognition part of the network.

ink1 commented 6 years ago

Yes, I'm running fsns_demo.py the way it is suggested in the readme

python fsns_demo.py <path to dataset directory> model_35000.npz \
<path to example image> ../datasets/fsns/fsns_char_map.json

Bartzi commented 6 years ago

which model are you using? FSNS or text recognition?

ink1 commented 6 years ago

I assume FSNS. I downloaded fsns_model.zip which contained model_35000.npz which I'm trying to use.

ink1 commented 6 years ago

I unpacked fsns_model.zip into model directory and then

cd chainer
python fsns_demo.py ../model model_35000.npz \
  ../file.jpg ../datasets/fsns/fsns_char_map.json --gpu 0

Bartzi commented 6 years ago

hmm sounds good so far, the error you get does not make any sense. Did you try debugging? I don't really know what the error is right now.

Did you change anything in fsns_demo.py?

ink1 commented 6 years ago

No, the code is the same. I did a bit of debugging and found that the network being initialised is InverseCompositionalLocalizationNet. I actually unpacked both fsns_model.zip and text_recognition_model.zip this must have caused the confusion. I deleted "model" folder and unpacked only fsns_model.zip. After that I see that FSNSSingleSTNLocalizationNet is being initialised. However I'm hitting some other issues. I figured the input must be an RGB. That's fine but then then

  File "fsns_demo.py", line 154, in <module>
    predictions, crops, grids = network(image[xp.newaxis, ...])
  File "/see/model/fsns.py", line 521, in __call__
    h = self.localization_net(images)
  File "/see/model/fsns.py", line 206, in __call__
    lstm_prediction = F.relu(self.lstm(in_feature))
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/links/connection/lstm.py", line 309, in __call__
    lstm_in = self.upward(x)
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/links/connection/linear.py", line 129, in __call__
    return linear.linear(x, self.W, self.b)
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/functions/connection/linear.py", line 118, in linear
    y, = LinearFunction().apply(args)
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/function_node.py", line 230, in apply
    self._check_data_type_forward(in_data)
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/function_node.py", line 298, in _check_data_type_forward
    self.check_type_forward(in_type)
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/functions/connection/linear.py", line 20, in check_type_forward
    x_type.shape[1] == w_type.shape[1],
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/utils/type_check.py", line 524, in expect
    expr.expect()
  File "/miniconda2/envs/chainer/lib/python3.6/site-packages/chainer/utils/type_check.py", line 482, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
chainer.utils.type_check.InvalidType: 
Invalid operation is performed in: LinearFunction (Forward)

Expect: in_types[0].shape[1] == in_types[1].shape[1]
Actual: 8640 != 3072

Not familiar with Chainer. Any clues?

ink1 commented 6 years ago

My major deviation from your requirements is TF 1.5.0. Is that potentially a problem? Edit: reverted to 1.4.1 to no avail

Bartzi commented 6 years ago

Yeah, I do have an idea about your problem: How does your input image look like? Are you using an image from the FSNS dataset?

The requirement Tensorflow is only necessary to extract all data for the FSNS dataset from the original files this dataset comes in, so this is definitely not a problem.

ink1 commented 6 years ago

You are right, it's the file size. As long as the image is 150x600, RGB FSNS Demo works fine. I've got some FSNS images and the output looks correct. However when I created a pseudo FSNS image (150x150 crop containing similarly sized text and repeated four times) it was not able to detect much at all. Is this what you meant when you said this code is not suitable? It is a bit surprising to me that text detection of a model trained on naturally occurring images would be so poor just because I'm feeding in a grey image. I was expecting it to generalise a bit better. If the only way forward for me is to train a new model taking in, say 32x32 single character images (e.g. Google street numbers plus some letters plus some of our own text), and apply it as a sliding multi-scale window then what model would you suggest? Should it be text/no-text or specific character detection during training? Would a recursion be necessary (i hope not)? Thanks you for your insight!

Bartzi commented 6 years ago

Is this what you meant when you said this code is not suitable?

Yeah, kind of. I don't know how your images look like, but if they differ significantly (meaning the text is all over the image) the FSNS model won't work. The problem is not necessarily the grayscale image, but it could be, too.

Did you already have a look at other recent text detection papers? They mostly use a faster-rcnn/yolo/ssd like approach for text detection. Those approaches seem to work very well for text detection.

ink1 commented 6 years ago

Besides MATLAB OCR (and related) and textspotting by Jaderberg et al (2014), I've only tried CRNN (https://github.com/meijieru/crnn.pytorch). I've seen a few other papers but they had no open source code or models. Lots of open source for object detection but not so much for text.

Bartzi commented 6 years ago

Oh, yeah that's right... most of the time there is no code... well I hope you can find a suitable Project! In case you hav time and ideas, you could of course also try to do some research using our idea :wink:

skrish13 commented 6 years ago

have you tried tesseract? its pretty good @ink1

Bartzi / see

text detection #16