HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

"Document "{document.name}" not added to database, because of parse error" but being added #489

Closed HiromuHota closed 4 years ago

HiromuHota commented 4 years ago

Description of the bug

When an error happens during parsing, an error message: "Document "{document.name}" not added to database, because of parse error" appears, but the Document object itself is added to database before the parser error happens.

To Reproduce

Steps to reproduce the behavior:

  1. Use the latest commit (1d6771befb95f4ae94f308899633294a003dcfd6)
  2. Let a parser fail
  3. Execute session.query(Document).count() to see how many Documents are added to the database.

Expected behavior

Either one of these behaviors

  1. The Document is not added.
  2. The error message gets corrected like "Document is added but not parsed".

Error Logs/Screenshots

"Document "{document.name}" not added to database, because of parse error"

Environment (please complete the following information)

Additional context

Add any other context about the problem here.

I think this is a regression caused by b1b8d24e1133e2229c5c02c6bdcf9eab36301a55, where doc: Document is saved to the database by session.merge(doc, load=True), so this could happen on v0.8.0, v0.8.1, v0.8.2 too.