JStateson / PDF-PhraseFinder

No longer maintained due to unresolved problem with Adobe dotnet sdk
0 stars 0 forks source link

Highlighted phrase on a page is not always the phrase found #2

Closed JStateson closed 1 year ago

JStateson commented 1 year ago

When the phrase list includes duplicate words, the Acrobat "FIND" will highlight each found phrase, not just the one selected.

For example, the following list for page "1" ------phrase------occurrences Food Provider----2 Food Service-----1 Food--------------1

The search algorithm removes a phrase once it has been found. The word "Food" occurs 5 times on page "1" but is only found once as the other phrases were removed. Searching is done starting with the longest phrase down to the smallest.

The built in Acrobat "find" function will find 5 of the "Food" words on page "1" and will highlight each of the five, not just the single word that did not have the context of Provider or Service.

JStateson commented 1 year ago

This is by design. The user should create unique phrases to avoid the problem.