VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
13.97k stars 707 forks source link

Python error after "Finding reading order" in convert_single.py, default options #174

Open Gnurrf opened 1 month ago

Gnurrf commented 1 month ago

This is on a Mac, M3 processor.

Detecting bboxes: 100%|███████████████████████████| 5/5 [02:22<00:00, 28.48s/it] Finding reading order: 100%|██████████████████████| 5/5 [00:15<00:00, 3.18s/it] OMP: Warning #96: Cannot form a team with 16 threads, using 3 instead. OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set). Traceback (most recent call last): File "/Users/xxx/Downloads/marker/convert_single.py", line 37, in main() File "/Users/xxx/Downloads/marker/convert_single.py", line 28, in main full_text, images, out_meta = convert_single_pdf(fname, model_lst, max_pages=args.max_pages, langs=langs, batch_multiplier=args.batch_multiplier, start_page=args.start_page) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/Downloads/marker/marker/convert.py", line 123, in convert_single_pdf table_count = format_tables(pages) ^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/Downloads/marker/marker/tables/table.py", line 138, in format_tables table_rows = get_table_pdftext(page, table_box) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/Downloads/marker/marker/tables/table.py", line 103, in get_table_pdftext table_rows = assign_cells_to_columns(page, table_box, table_rows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/Downloads/marker/marker/tables/cells.py", line 56, in assign_cells_to_columns separators = find_column_separators(page, table_box, round_factor=round_factor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/Downloads/marker/marker/tables/cells.py", line 31, in find_column_separators line_boxes = [p.bbox for p in page.text_lines.bboxes] ^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'bboxes'