HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

Log a stack trace on parsing error for better debug experience #479

Closed HiromuHota closed 4 years ago

HiromuHota commented 4 years ago

Description of the problems or issues

Is your pull request related to a problem? Please describe.

See #478.

Does your pull request fix any issue.

Fix #478.

Description of the proposed changes

Use logging.exception() instead of warnings.warn() to show a stack trace.

Test plan

Running pytest tests/parser/test_parser.py::test_parse_error_doc_skipping with log_cli = true can show you detailed message.

Checklist

codecov-commenter commented 4 years ago

Codecov Report

Merging #479 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #479   +/-   ##
=======================================
  Coverage   85.81%   85.81%           
=======================================
  Files          88       88           
  Lines        4554     4554           
  Branches      847      847           
=======================================
  Hits         3908     3908           
  Misses        464      464           
  Partials      182      182           
Flag Coverage Δ
#unittests 85.81% <100.00%> (ø)
Impacted Files Coverage Δ
src/fonduer/parser/parser.py 92.94% <100.00%> (ø)
HiromuHota commented 4 years ago

With log_cli = true in tests/pytest.ini, currently,

$ pytest tests/parser/test_parser.py::test_parse_error_doc_skipping
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.7.3, pytest-5.4.3, py-1.8.0, pluggy-0.13.0 -- /Users/hiromu/miniconda3/envs/fonduer-dev/bin/python
cachedir: .pytest_cache
rootdir: /Users/hiromu/workspace/fonduer/tests, inifile: pytest.ini
plugins: cov-2.9.0, mock-3.1.0, dependency-0.5.1
collected 1 item                                                                                                                                                                 

tests/parser/test_parser.py::test_parse_error_doc_skipping PASSED                                                                                                          [100%]

================================================================================ warnings summary ================================================================================
/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/jsonschema/compat.py:6
/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/jsonschema/compat.py:6
  /Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/jsonschema/compat.py:6: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import MutableMapping, Sequence  # noqa

/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/socks.py:58
  /Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/socks.py:58: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Callable

/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/plac_ext.py:6
  /Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/plac_ext.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

parser/test_parser.py::test_parse_error_doc_skipping
  /Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py:286: UserWarning: Document ext_diseases_missing_table_tag not added to database, because of parse error: 
  Table row parent must be a Table.
    f"Document {document.name} not added to database, "

-- Docs: https://docs.pytest.org/en/latest/warnings.html
========================================================================= 1 passed, 5 warnings in 2.53s ==========================================================================

With this PR

$ pytest tests/parser/test_parser.py::test_parse_error_doc_skipping
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.7.3, pytest-5.4.3, py-1.8.0, pluggy-0.13.0 -- /Users/hiromu/miniconda3/envs/fonduer-dev/bin/python
cachedir: .pytest_cache
rootdir: /Users/hiromu/workspace/fonduer/tests, inifile: pytest.ini
plugins: cov-2.9.0, mock-3.1.0, dependency-0.5.1
collected 1 item                                                                                                                                                                 

tests/parser/test_parser.py::test_parse_error_doc_skipping 
--------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------
[ERROR] Document ext_diseases_missing_table_tag not added to database, because of parse error: 
Table row parent must be a Table.
Traceback (most recent call last):
  File "/Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py", line 261, in apply
    [y for y in self.parse(document, document.text)]
  File "/Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py", line 261, in <listcomp>
    [y for y in self.parse(document, document.text)]
  File "/Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py", line 842, in parse
    tokenized_sentences += [y for y in self._parse_node(node, state)]
  File "/Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py", line 842, in <listcomp>
    tokenized_sentences += [y for y in self._parse_node(node, state)]
  File "/Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py", line 781, in _parse_node
    state = self._parse_table(node, state)
  File "/Users/hiromu/workspace/fonduer/src/fonduer/parser/parser.py", line 339, in _parse_table
    raise NotImplementedError("Table row parent must be a Table.")
NotImplementedError: Table row parent must be a Table.
PASSED                                                                                                                                                                     [100%]

================================================================================ warnings summary ================================================================================
/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/jsonschema/compat.py:6
/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/jsonschema/compat.py:6
  /Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/jsonschema/compat.py:6: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import MutableMapping, Sequence  # noqa

/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/socks.py:58
  /Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/socks.py:58: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Callable

/Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/plac_ext.py:6
  /Users/hiromu/miniconda3/envs/fonduer-dev/lib/python3.7/site-packages/plac_ext.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

-- Docs: https://docs.pytest.org/en/latest/warnings.html
========================================================================= 1 passed, 4 warnings in 2.59s ==========================================================================

One can clearly see where this error comes from, which would greatly help debug the error.