BookStackApp / BookStack

A platform to create documentation/wiki content built with PHP & Laravel
https://www.bookstackapp.com/
MIT License
15.18k stars 1.9k forks source link

Highlight text containing diacritics on search results #4188

Open athoik opened 1 year ago

athoik commented 1 year ago

Describe the Bug

Searching text with or without diacritics works great! 👍

Although the highlighted text works only if an exact match is found on getMatchPositions

https://github.com/BookStackApp/BookStack/blob/a46b438a4c5dc52c8592aec681473c858cfdbd27/app/Search/SearchResultsFormatter.php#L92

So search a text like δοκιμή will only get highlighted only if enter as written on a page. Entering text δοκιμη works, but no highlighted text shown on search results.

The following patch fixes the issue, using transliterator_transliterate to convert text to lower case without diacritics. It requires package php-intl installed (eg apt-get install php8.2-intl).

diff --git a/app/Search/SearchResultsFormatter.php b/app/Search/SearchResultsFormatter.php
index 9cbc5ee6..6bbab29a 100644
--- a/app/Search/SearchResultsFormatter.php
+++ b/app/Search/SearchResultsFormatter.php
@@ -84,11 +84,11 @@ class SearchResultsFormatter
     protected function getMatchPositions(string $text, array $terms): array
     {
         $matchRefs = [];
-        $text = mb_strtolower($text);
+        $text = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; Lower; NFC;', $text);

         foreach ($terms as $term) {
             $offset = 0;
-            $term = mb_strtolower($term);
+            $term = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; Lower; NFC;', $term);
             $pos = mb_strpos($text, $term, $offset);
             while ($pos !== false) {
                 $end = $pos + mb_strlen($term);

I believe above above change will work universally for all languages with diacritics.

Please consider accepting that change, if you believe it will improve BookStack.

Thanks!

Steps to Reproduce

  1. Create a page that contains text δοκιμή (Greek word for test, with ή -> GREEK SMALL LETTER ETA WITH TONOS)
  2. Got to 'search'
  3. Type word δοκιμη (small letters without diacritics)
  4. Search results appear but text δοκιμή is not highlighted

Expected Behaviour

The text δοκιμή should be highlighted, since it was possible to search that text.

Screenshots or Additional Context

No response

Browser Details

No response

Exact BookStack Version

v23.02.3

PHP Version

8.2.5

Hosting Environment

Debian 11 with PHP 8.2 by @armando-femat

esakkiraja100116 commented 1 year ago

Already the text δοκιμή was highlighted as you expect @athoik

image

athoik commented 1 year ago

@esakkiraja100116 that is correct, you typed the word δοκιμή including diacritics.

Now give another try searching the word δοκιμη without diacritics and let me know if it gets highlighted.

esakkiraja100116 commented 1 year ago

Yes, it's highlighted. Can you provide any screenshot like this @athoik

image

athoik commented 1 year ago

Using the word δοκιμη also δοκιμή should be highlighted (that's what patch is doing)

image

esakkiraja100116 commented 1 year ago

Screenshot from 2023-04-26 14-09-28

ssddanbrown commented 1 year ago

@esakkiraja100116 I'm pretty sure your screenshots are showing the scenario that @athoik is trying to address here. I believe they'd desire both instances of the term in your screenshot to become bold, not just the last.


Thanks for investigating and providing a patch @athoik. I'm going to reclassify this as a feature request, since it's not a break/bug in existing supported behaviour (I didn't really know this was a thing) but a request to specifically support diacritics here.

In regards to the patch, I'm not too keen on adding a new system requirement just to meet what is mostly a minor presentational feature (with a little functional purpose). We could conditionally do this based upon extension existence, but not sure if that's a route I'd want to take. I'll have to ponder upon options for this.

athoik commented 1 year ago

It's really trivial feature but really nice on Greek users (or other communities using diacritics), since it's common to search words with or without diacritics.

Please feel free to include that feature in hacks section! It might be useful for other people too.

In case php-intl package becomes a thing, then we can re-consider the addition.

Thanks a lot for your support! 👍

esakkiraja100116 commented 1 year ago

Thanks for your clarification @ssddanbrown. I confused with label called bug. As you said it's a feature request