Closed henjrchen closed 3 months ago
hi @henjrchen , i used your code but with a bit of modification (see below) to output json file (attached below) and i see there are two elements to be returned: 1. element type is 'title' 2. element type is 'table'. When you say 2 contents, are you referring to the same 2 elements i mentioned? output.json
curl -X 'POST' 'https://api.unstructured.io/general/v0/general' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -H 'unstructured-api-key: xxx' -F 'files=@table6.xlsx' -o output.json
Hi @tbs17 Thank you for your quick reply. Yes, that’s what I meant. When using the PDF format, it returns a table, so in Excel, it appears as two elements, which exceeded my expectations. However, I found the ‘parent_id’ information, which allows me to link these two elements together. Thanks
Describe the bug I originally expect the following output result was one content, but it turned out to be 2 contents. Is this an issue? Or is there any other way to solve it? Thanks very much
To Reproduce curl -X 'POST' 'https://api.unstructured.io/general/v0/general' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -H 'unstructured-api-key: xxxx' -F 'files=@table.xlsx' | jq -C . | less -R
table6.pdf table6.xlsx
Environment:
Additional context