API users are hitting the following error. The issue is that we can have a bbox of None, see example input below.
TypeError: unsupported operand type(s) for -: 'int' and 'NoneType']
File "prepline_general/api/general.py", line 396, in pipeline_api
elements = partition(
File "unstructured/partition/auto.py", line 316, in partition
elements = _partition_pdf(
File "unstructured/documents/elements.py", line 276, in wrapper
elements = func(*args, **kwargs)
File "unstructured/file_utils/filetype.py", line 551, in wrapper
elements = func(*args, **kwargs)
File "unstructured/chunking/title.py", line 211, in wrapper
elements = func(*args, **kwargs)
File "unstructured/partition/pdf.py", line 148, in partition_pdf
return partition_pdf_or_image(
File "unstructured/partition/pdf.py", line 245, in partition_pdf_or_image
extracted_elements = extractable_elements(
File "unstructured/partition/pdf.py", line 171, in extractable_elements
return _partition_pdf_with_pdfminer(
File "unstructured/utils.py", line 159, in wrapper
return func(*args, **kwargs)
File "unstructured/partition/pdf.py", line 433, in _partition_pdf_with_pdfminer
elements = _process_pdfminer_pages(
File "unstructured/partition/pdf.py", line 509, in _process_pdfminer_pages
urls_metadata.append(map_bbox_and_index(words, annot))
File "unstructured/partition/pdf.py", line 1033, in map_bbox_and_index
(annot["bbox"][0] - np.array([word["bbox"][0] for word in words])) ** 2
API users are hitting the following error. The issue is that we can have a bbox of None, see example input below.
TypeError: unsupported operand type(s) for -: 'int' and 'NoneType']