fix: fix `IndexError` when partioning a pdf with `starting_page_number`

The Issue:

When extracting images from pdfs, we use the metadata page number to index into a list of the images. However, the metadata page number can now be changed via starting_page_number. To get the true page index, we need to subtract this value.

Testing:

Run this snippet in a python shell. Before the fix, this throws an IndexError. On this branch, it will return the elements.

from unstructured.partition.auto import partition
filename = "example-docs/layout-parser-paper-with-table.pdf"
partition(filename, strategy="hi_res", extract_image_block_types=["Image", "Table"], starting_page_number=20)

Unstructured-IO / unstructured

fix: fix `IndexError` when partioning a pdf with `starting_page_number` #3246