Unstructured-IO / unstructured-js-client

A Typescript client for the Unstructured hosted API
MIT License
37 stars 10 forks source link

feat: Parameter to send custom page range when splitting pdf #101

Closed awalker4 closed 1 month ago

awalker4 commented 1 month ago

To match the python feature: https://github.com/Unstructured-IO/unstructured-python-client/pull/125

New parameter

Add a client-side param called splitPdfPageRange which takes a list of two integers, [start, end]. If splitPdfPage is true and a range is set, slice the doc from start up to and including end. Only this page range will be sent to the API. The subset of pages is still split up as needed. If [start, end] is out of bounds, throw an error to the user.

Testing

Check out this branch and set up a request to your local API:

const client = new UnstructuredClient({
    serverURL: "http://localhost:8000",
    security: {
        apiKeyAuth: key,
    },
});

const filename = "layout-parser-paper.pdf";
const data = fs.readFileSync(filename);

client.general.partition({
    partitionParameters: {
        files: {
            content: data,
            fileName: filename,
        },
        strategy: Strategy.Fast,
        splitPdfPage: true,
        splitPdfPageRange: [4, 8],
    }
}).then((res: PartitionResponse) => {
    if (res.statusCode == 200) {
        console.log(res.elements);
    }
}).catch((e) => {
    if (e.statusCode) {
      console.log(e.statusCode);
      console.log(e.body);
    } else {
      console.log(e);
    }
});

Test out various page ranges and confirm that the returned elements are within the range. Invalid ranges should throw a useful Error (pages are out of bounds, or end_page < start_page).