Filimoa / open-parse

Improved file parsing for LLM’s
https://filimoa.github.io/open-parse/
MIT License
2.34k stars 89 forks source link

UniTable Cookbook notebook has errors #13

Closed zacharysmithdatatonic closed 5 months ago

zacharysmithdatatonic commented 5 months ago

Issue

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 11
      8 pdfs_with_tables_dir = Path("./tables")
     10 for pdf_path in pdfs_with_tables_dir.glob("*"):
---> 11     parser = openparse.DocumentParser(
     12         table_args={
     13             "parsing_algorithm": "unitable",
     14             "min_table_confidence": 0.8,
     15         },
     16         processing_pipeline=[],  # don't want any processing
     17     )
     18     parsed_nodes = parser.parse(pdf_path)
     19     table_nodes = [node for node in parsed_nodes.nodes if "table" in node.variant]

File ~/Developer/bossard-sandbox/open-parse/.venv/lib/python3.11/site-packages/openparse/doc_parser.py:78, in DocumentParser.__init__(self, processing_pipeline, table_args)
     75 else:
     76     self.processing_pipeline = processing_pipeline  # type: ignore
---> 78 self.processing_pipeline.verbose = self._verbose
     80 self.table_args = table_args

AttributeError: 'list' object has no attribute 'verbose'

Reproduction

  1. Created a poetry environment with the following config:
[tool.poetry]
name = "open-parse"
version = "0.1.0"
description = ""
authors = ["Zachary"]

[tool.poetry.dependencies]
python = "^3.8"
openparse = {extras = ["ml"], version = "^0.5.1"}
jupyter = "^1.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
  1. Continue setup following the https://filimoa.github.io/open-parse/parsing-tables/unitable/ guide.
  2. Run the notebook with poetry run jupyter notebook unitable.ipynb and run the code cell to raise the issue.
tzenmatt commented 5 months ago

I had the same issue

Filimoa commented 5 months ago

Forgot to update this notebook when we made some pre-launch changes, should now be fixed.

zacharysmithdatatonic commented 5 months ago

Thanks for the swift fix 🙏🏻