NationalLibraryOfNorway / meteor

A python module and REST API for automatic extraction of metadata from PDF files
Apache License 2.0
11 stars 2 forks source link

perf: Use flag to skip images in page blocks (TT-1336) #20

Closed pierrebeauguitte closed 10 months ago

pierrebeauguitte commented 11 months ago

I tested this change on a set of 150 large PDF files, and the total execution time dropped from ca. 10 minutes to 2 minutes.