acl-org / aclpubcheck

Tools for checking ACL paper submissions
MIT License
598 stars 47 forks source link

Incorrect warning during the check of the margin with invisible characters #19

Closed crux82 closed 2 years ago

crux82 commented 2 years ago

In some papers, the SW incorrectly raises an error when checking the margin but there is no visible text there.

Actually, there is some text but it is invisible...

I think the problem is somewhere in formatchecker.py

                # Parse texts
                for j, word in enumerate(p.extract_words()):
                    violation = None
                    if float(word["top"]) < (57-self.top_offset):
                        violation = Margin.TOP
                    elif float(word["x0"]) < (71-self.left_offset):
                        violation = Margin.LEFT
                    elif Page.WIDTH.value-float(word["x1"]) < (71-self.right_offset):
                        violation = Margin.RIGHT
crux82 commented 2 years ago

@ryancotterell Maybe I found a solution. Actually, it may sound like a workaround, but it seems quite impossible to robustly detect invisible characters: when a candidate area out of the margin is proposed, this is cropped and if all pixels are equal to the background, this is skipped.

You find this implemented in the branch