When extracting images from pdfs, we use the metadata page number to index into a list of the images. However, the metadata page number can now be changed via starting_page_number. To get the true page index, we need to subtract this value.
Testing:
Run this snippet in a python shell. Before the fix, this throws an IndexError. On this branch, it will return the elements.
The Issue:
When extracting images from pdfs, we use the metadata page number to index into a list of the images. However, the metadata page number can now be changed via
starting_page_number
. To get the true page index, we need to subtract this value.Testing:
Run this snippet in a python shell. Before the fix, this throws an IndexError. On this branch, it will return the elements.