Closed ds-filipknefel closed 3 months ago
I thought adding table extraction would increase the API response time. Mind if i ask why we are setting this default to true?
I thought adding table extraction would increase the API response time. Mind if i ask why we are setting this default to true?
On the request of @pawel-kmiecik, can you chime in for the reasons for it?
I thought adding table extraction would increase the API response time. Mind if i ask why we are setting this default to true?
On the request of @pawel-kmiecik, can you chime in for the reasons for it?
That's what I set with @cragwolfe in private conversation about parameters steering the table extraction:
e.g. with curl, one should just now just use the parameter: -F skip_infer_table_types="[]" to always get tables extracted if they are available for the API
AFAIK both pdf_infer_table_structure="true"
and skip_infer_table_types="[]"
must be explicitly set by a user to get the table extraction. We'd like to leave only skip_infer_table_types="[]"
gotcha, thanks!
please remove the readme instruction here: https://github.com/Unstructured-IO/unstructured-api?tab=readme-ov-file#pdf-table-extraction (and maybe some nit on https://github.com/Unstructured-IO/unstructured-api?tab=readme-ov-file#skip-table-extraction) otherwise LGTM!
I thought adding table extraction would increase the API response time. Mind if i ask why we are setting this default to true?
On the request of @pawel-kmiecik, can you chime in for the reasons for it?
That's what I set with @cragwolfe in private conversation about parameters steering the table extraction:
e.g. with curl, one should just now just use the parameter: -F skip_infer_table_types="[]" to always get tables extracted if they are available for the API
AFAIK both
pdf_infer_table_structure="true"
andskip_infer_table_types="[]"
must be explicitly set by a user to get the table extraction. We'd like to leave onlyskip_infer_table_types="[]"
Actually right now it's enough to set pdf_infer_table_structure=true
because of how it's handled internally. Logging will throw a warning and tell you to correct yourself but actually ignore skip_infer_table_types
contents.
So after this change even if user leaves pdf_infer_table_structure
and skip_infer_table_types
with default values, extraction will happen.
Change default values for table extraction - relies on this
unstructured
PR being merged and synced.We want to move away from
pdf_infer_table_structure
parameter, in this PR we:pdf_infer_table_structure=True
andskip_infer_table_types=[]
by defaultskip_infer_table_types
in documentationunstructured
dependecy to at least 0.13.0 (REQUIRED)More detailed description of how we want parameters to interact
pdf_infer_table_structure
is False tables will never extracted from pdfpdf_infer_table_structure
is True tables will be extracted from pdf unless it's skipped viaskip_infer_table_types
pdf_infer_table_structure=True
andskip_infer_table_types=[]