Open cutright opened 4 years ago
main.process_files()
in branch issue_17 has the feature to ignore previously processed files. Collecting all processed files is pretty fast, but it seems like the bottleneck is iterating through the OS directory, not parsing the data. Or perhaps the time is spent checking if a file name exists in the previously processed files.
Needs investigation.
Feature request to ignore previously processed PDFs