HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

Details of a parse error #478

Closed HiromuHota closed 4 years ago

HiromuHota commented 4 years ago

Description of the feature request

Is your feature request related to a problem? Please describe.

When an error happens during parsing, Fonduer shows an error message (actually a warning message), which is not enough to debug it.

/home/user/.venv/lib/python3.7/site-packages/fonduer/parser/parser.py:286: UserWarning: Document XXX not added to database, because of parse error: 
list index out of range
  f"Document {document.name} not added to database, "

Description of the solution you'd like

I'd like Fonduer to show a stack trace when log.setLevel(logging.DEBUG) so that I can see where the error comes from.

Description of the alternatives you've considered

N/A

Additional context

Fonduer: v0.8.2

HiromuHota commented 4 years ago

2ec28959af872381e109e10c7684c26384c63df2 changed the way how an exception is handled:

diff --git a/src/fonduer/parser/parser.py b/src/fonduer/parser/parser.py
index 3cd42066..2ca60bd6 100644
--- a/src/fonduer/parser/parser.py
+++ b/src/fonduer/parser/parser.py
@@ -239,9 +239,9 @@ class ParserUDF(UDF):

             yield from return_sentences
         except NotImplementedError as e:
-            logger.warning(
-                "Skipped parsing of document {}, because of parse error: {}."
-                " Not adding document to database".format(document.name, e)
+            warnings.warn(
+                "Document {} not added to database, "
+                "because of parse error: \n{}".format(document.name, e)
             )

I wonder the reason for this change.

HiromuHota commented 4 years ago

https://docs.python.org/3/howto/logging.html#when-to-use-logging describes when to use which logging.

senwu commented 4 years ago

I cannot recall the motivation behind this. The design principle there is (1) Do not break the parsing; (2) Provide information to user about the parse error. I do think more information there is better.

HiromuHota commented 4 years ago

Thanks for your input.

479 provides more detailed information (ie a stack trace) yet does not stop parsing on parse error.