Closed macahi closed 10 months ago
Describe the bug
Hello,
I am currently testing the unstructured-api using the chunking_strategy=by_title. I noticed that the max_characters parameter for the chunk_by_title method cannot be passed via pipeline_api: https://github.com/Unstructured-IO/unstructured-api/blob/c91d1b9966f5281344fe5d2e662b94ea3aa2e46d/prepline_general/api/general.py#L302 As a result, it's not possible to specify values for new_after_n_chars that exceed the default value of max_characters (500).
chunking_strategy=by_title
max_characters
chunk_by_title
new_after_n_chars
To Reproduce
curl -X 'POST' 'https://api.unstructured.io/general/v0/general' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'files=@sample-docs/layout-parser-paper-fast.pdf' \ -F 'chunking_strategy=by_title' -F 'new_after_n_chars=1500'
new_after_n_chars has no effect; the maximum chunk size is 500.
If I'm correct, it should be easy to fix.
Hi there, this will certainly be a quick fix. We'll keep you posted!
This is now deployed in the hosted api
Describe the bug
Hello,
I am currently testing the unstructured-api using the
chunking_strategy=by_title
. I noticed that themax_characters
parameter for thechunk_by_title
method cannot be passed via pipeline_api: https://github.com/Unstructured-IO/unstructured-api/blob/c91d1b9966f5281344fe5d2e662b94ea3aa2e46d/prepline_general/api/general.py#L302 As a result, it's not possible to specify values fornew_after_n_chars
that exceed the default value ofmax_characters
(500).To Reproduce
new_after_n_chars
has no effect; the maximum chunk size is 500.If I'm correct, it should be easy to fix.