aryn-ai / sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
https://sycamore.readthedocs.io
Apache License 2.0
352 stars 39 forks source link

Fix long line issue in partitioner client #589

Closed eric-anderson closed 2 months ago

eric-anderson commented 2 months ago

The default requests code can be quadratic in the data size if it finds a single long line without a newline because it is doing the standard append to a string trick. Switch over to chunking based on raw data and explicitly handle splitting by lines for the early status output.

MarkLindblad commented 2 months ago

Update the MockResponses in test_aryn_partitioner.py to include an iter_content

FAILED sycamore/tests/unit/transforms/test_aryn_partitioner.py::TestArynPDFPartitioner::test_partition - AttributeError: 'MockResponseNoTables' object has no attribute 'iter_content'
FAILED sycamore/tests/unit/transforms/test_aryn_partitioner.py::TestArynPDFPartitioner::test_partition_extract_table_structure - AttributeError: 'MockResponseTables' object has no attribute 'iter_content'