acl-org / aclpubcheck

Tools for checking ACL paper submissions
MIT License
598 stars 47 forks source link

Crash when all pages already have error before counting the pages #62

Closed icannos closed 1 year ago

icannos commented 1 year ago

If all the pages already contain errors (For example the paper is not a4) then check_page_num will not update variable marker and crash. Indeed all updates will be skipped because of if i+1 in self.page_errors: and results in a comparison error between None (marker) and Tuple.

        marker = None
        if len(self.pdf.pages) <= page_threshold:
            return

        for i, page in enumerate(self.pdf.pages):
            if i+1 in self.page_errors:
                continue
            text = page.extract_text().split('\n')
            for j, line in enumerate(text):
                if marker is None and any(x in line for x in candidates):
                    marker = (i+1, j+1)
                #if "Acknowl" in line and all(x not in line for x in acks):
                #    self.logs[Error.SPELLING] = ["'Acknowledgments' was misspelled."]

        # if the first marker appears after the first line of page 10,
        # there is high probability the paper exceeds the page limit.

+        # If we reached this state and that marker is still None it means all pages already have errors
+       # We can return here to print the already existing errors
+       if marker is None:
+          return

        if marker > (page_threshold + 1, 1):
            page, line = marker
            self.logs[Error.PAGELIMIT] = [f"Paper exceeds the page limit "
                                      f"because first (References, "
                                      f"Acknowledgments, Ethics Statement) was found on "
                                      f"page {page}, line {line}."]

This fix ensures that the script terminates and prints the errors it has already found without crashing.