OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
48 stars 12 forks source link

Fix tiseg #79

Closed bertsky closed 3 years ago

bertsky commented 3 years ago

This brings about the bare minimum functionality of this processor.

Results are not that good, but that's another story. (Legacy text/non-text segmentation does not detect enough non-text, so the recall is bad but precision is good. Deep ML segmentation however fails miserably as it does not even preserve most foreground at all, plus precision is equally bad as recall.)

bertsky commented 3 years ago

Results are not that good, but that's another story. (Legacy text/non-text segmentation does not detect enough non-text, so the recall is bad but precision is good. Deep ML segmentation however fails miserably as it does not even preserve most foreground at all, plus precision is equally bad as recall.)

See #80 for a discussion about that.

bertsky commented 3 years ago

Here's more, bringing some sanity to block segmentation. (Changes to the mrcnn part loosely derive from my Mask_RCNN fork.)

Again, this is not about model quality itself, but what best to make of it. For concerns about training/model quality, cf. last comments in #82.

bertsky commented 3 years ago

Regarding tiseg again:

But does any consuming processor actually make use of the alpha channel? I highly doubt it.

Since the model was obviously trained on raw images, we have to apply it on raw images. But we can still take binarized images (from a binarization step in the workflow) to apply our resulting mask – by filling with white.

That seems like the better OCR-D interface to me. (Of course, contour-finding and annotation via coordinates would still be better than as clipped derived image.) What do you think, @kba?

I already have a commit for this in line, i.e. merely comparing the text score with image score (and ignoring background score), and then white-filling and alpha-masking the image parts in the clipped raw result. Should I add that?

bertsky commented 3 years ago

BTW, we still do have a problem with loggers here. In https://github.com/OCR-D/ocrd_anybaseocr/pull/79/commits/5a4d874a4caf62d960acca4622dd65ab56a58670 I had to remove the initLogging from tensorflow_importer, as this would cause a large and unfriendly error message about doing initLogging twice. But now, the TF logger does not adhere to OCR-D conventions, and what's worse, spoil -h and -J output again.

So how do we do this properly, @kba? Use our getLogger, but only import tensorflow_importer in process?

kba commented 3 years ago

I already have a commit for this in line, i.e. merely comparing the text score with image score (and ignoring background score), and then white-filling and alpha-masking the image parts in the clipped raw result. Should I add that?

IIUC (and that's a big if) then yes, please.

BTW, we still do have a problem with loggers here. In 5a4d874 I had to remove the initLogging from tensorflow_importer, as this would cause a large and unfriendly error message about doing initLogging twice. But now, the TF logger does not adhere to OCR-D conventions, and what's worse, spoil -h and -J output again.

So how do we do this properly, @kba? Use our getLogger, but only import tensorflow_importer in process?

The longer I have to deal with the not initialized or initialized twice logged errors, the less I am convinced that it was a good idea :( But in this case, dropping the initLogging() from and importing only at process tensorflow_importer seems reasonable. It might cause runtime errors due to unavailable dependencies but also saves time for non-processing calls.

bertsky commented 3 years ago

The longer I have to deal with the not initialized or initialized twice logged errors, the less I am convinced that it was a good idea :( But in this case, dropping the initLogging() from and importing only at process tensorflow_importer seems reasonable. It might cause runtime errors due to unavailable dependencies but also saves time for non-processing calls.

I get the same feeling. Also, processing vs non-processing context and module-level imports: Implementing process() almost always drags in full dependencies, but splitting a (Processor) class across modules is impossible. We are even lucky -J does work at all for now: Every keras processor will invariably print("Using TensorFlow backend"). We cannot rule such things out ever. – No idea how to address this.

kba commented 3 years ago

Every keras processor will invariably print("Using TensorFlow backend")

I have been actively defending against that with hacks like tensorflow_importer or

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
stderr = sys.stderr
sys.stderr = open(os.devnull, "w")
from keras import backend as K
from keras.models import load_model
sys.stderr = stderr
import tensorflow as tf
tf.get_logger().setLevel("ERROR")
warnings.filterwarnings("ignore")

It's quite a mouthful to stop those import-time print statements but I see no way around it for --dump-json etc.