glpi-project / glpi

GLPI is a Free Asset and IT Management Software package, Data center management, ITIL Service Desk, licenses tracking and software auditing.
https://glpi-project.org
GNU General Public License v3.0
4.14k stars 1.28k forks source link

GLPI 10.0 RC3 Knowledge base advanced (Boolean Full-Text Search) operators ignored #11300

Closed derry43 closed 1 year ago

derry43 commented 2 years ago

Code of Conduct

Is there an existing issue for this?

Version

10.0 RC3

Bug description

The Boolean Full-Text Search operators (+, -, ~, *, <, >, () and "") are ignored in the Knowledge base search and therefor do not work as per the documentation (doc excerpt below). A possible fix is provided in the additional comments made after raising this bug report.

My experience has shown that searching for articles with lines that must contain both the words "tribbles" and "with" by using the search expression +tribbles +with will return all articles containing either "with" or "tribbles", the same as simply using the search term tribbles with. Effectively ignoring the + operator.

And searching for articles that must contain the literal sequence "with tribbles" by using the search expression "with tribbles" returns all articles containing either "with" or "tribbles", the same as simply using the search term tribbles with. Effectively ignoring the " quotes.

Worse still, searching for articles that must contain "tribbles" but must not contain "kirk" by using the search expression +tribbles -kirk also returns all articles containing either "tribbles" or "kirk", the same as simply using the search term tribbles kirk. Effectively ignoring the - operator.

Documentation on Knowledge Base Search Operators

+: word must be there; -: word must not be there; *: truncate suffix; " ": contained sequence must be searched literally; < >: define order on search elements; (): group when using < and >.

Examples :

  • printer failure -> Search lines containing at least one of these words

  • +printer +failure -> Search lines containing both words

  • +mail thunderbird -> Search lines containing word mail but rank higher lines containing also word thunderbird

  • +mail -outlook -> Search lines containing word mail but not word outlook

    • +mail +(>thunderbird <outlook) -> Search lines containing word mail and thunderbird, or mail and outlook, in any order, but rank mail thunderbird higher than mail outlook
  • open -> Search lines containing words such as openoffice, openwriter, openbar, openphp*...

  • "openoffice suite" -> Search lines containing exactly sentence openoffice suite

Relevant log output

No response

Page URL

front/knowbaseitem.php

Steps To reproduce

  1. Create the following 5 Knowledge base articles Subject: Not in the subject Content: Tribbles in the content Subject: Spock Content: Thine own self - Star Trek Subject: Star Trek Content: Eye of the beholder Subject: Tribbles Content: The trouble with tribbles - Star Trek Captain Kirk Subject: Tribbles in the Subject Content: Not in the content
  2. Publish these articles
  3. Search for +tribbles +with
  4. Search for "with tribbles"
  5. Search for +tribbles -kirk

kbsearch

Your GLPI setup information

Information about system installation and configuration
GLPI 10.0.0-rc3 ( => C:\inetpub\wwwroot\glpi10x)
Installation mode: TARBALL
Current language:en_GB

Server
 
Operating system: Windows NT EMPCRUUKWPBU012 6.3 build 9600 (Windows Server 2012 R2 Standard Edition) AMD64
PHP 7.4.15 cgi-fcgi (Core, PDO, Phar, Reflection, SPL, SimpleXML, Zend OPcache, apcu, bcmath, bz2, calendar, cgi-fcgi, ctype,
    curl, date, dom, exif, fileinfo, filter, gd, gettext, hash, iconv, intl, json, ldap, libxml, mbstring, mysqli, mysqlnd, openssl,
    pcre, readline, session, soap, sodium, standard, tokenizer, xml, xmlreader, xmlrpc, xmlwriter, zip, zlib)
Setup: max_execution_time="300" memory_limit="128M" post_max_size="8M" safe_mode="" session.save_handler="files"
    upload_max_filesize="10M" 
Software: Microsoft-IIS/8.5
    Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36 Edg/100.0.1185.36
Server Software: MySQL Community Server - GPL
    Server Version: 8.0.23
    Server SQL Mode: STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
    Parameters: glpi@172.16.2.70/glpi10x
    Host info: 172.16.2.70 via TCP/IP

PHP version (7.4.15) is supported.
Sessions configuration is OK.
Allocated memory is sufficient.
mysqli extension is installed.
Following extensions are installed: dom, fileinfo, json, simplexml.
curl extension is installed.
gd extension is installed.
intl extension is installed.
libxml extension is installed.
zlib extension is installed.
The constant SODIUM_CRYPTO_AEAD_XCHACHA20POLY1305_IETF_NPUBBYTES is present.
Database engine version (8.0.23) is supported.
The log file has been created successfully.
Write access to C:\inetpub\wwwroot\glpi10x/files/_cache has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/config has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_cron has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_dumps has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_graphs has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_lock has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_pictures has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_plugins has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_rss has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_sessions has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_tmp has been validated.
Write access to C:\inetpub\wwwroot\glpi10x/files/_uploads has been validated.
Web access to the files directory should not be allowed but this cannot be checked automatically on this instance.
Make sure access to error log file (/files/_log/php-errors.log) is forbidden; otherwise review .htaccess file and web server configuration.
exif extension is installed.
ldap extension is installed.
openssl extension is installed.
zip extension is installed.
bz2 extension is installed.
Zend OPcache extension is installed.
Following extensions are installed: ctype, iconv, mbstring, sodium.
Write access to C:\inetpub\wwwroot\glpi10x/marketplace has been validated.
Timezones seems loaded in database.

GLPI constants
 
GLPI_ROOT: C:\inetpub\wwwroot\glpi10x
GLPI_CONFIG_DIR: C:\inetpub\wwwroot\glpi10x/config
GLPI_VAR_DIR: C:\inetpub\wwwroot\glpi10x/files
GLPI_MARKETPLACE_DIR: C:\inetpub\wwwroot\glpi10x/marketplace
GLPI_USE_CSRF_CHECK: 1
GLPI_CSRF_EXPIRES: 7200
GLPI_CSRF_MAX_TOKENS: 100
GLPI_USE_IDOR_CHECK: 1
GLPI_IDOR_EXPIRES: 7200
GLPI_ALLOW_IFRAME_IN_RICH_TEXT: 
GLPI_TELEMETRY_URI: https://telemetry.glpi-project.org
GLPI_INSTALL_MODE: TARBALL
GLPI_NETWORK_MAIL: glpi@teclib.com
GLPI_NETWORK_SERVICES: https://services.glpi-network.com
GLPI_MARKETPLACE_ALLOW_OVERRIDE: 1
GLPI_MARKETPLACE_MANUAL_DOWNLOADS: 1
GLPI_USER_AGENT_EXTRA_COMMENTS: 
GLPI_DISABLE_ONLY_FULL_GROUP_BY_SQL_MODE: 1
GLPI_AJAX_DASHBOARD: 1
GLPI_CALDAV_IMPORT_STATE: 0
GLPI_DEMO_MODE: 0
GLPI_CENTRAL_WARNINGS: 1
GLPI_DOC_DIR: C:\inetpub\wwwroot\glpi10x/files
GLPI_CACHE_DIR: C:\inetpub\wwwroot\glpi10x/files/_cache
GLPI_CRON_DIR: C:\inetpub\wwwroot\glpi10x/files/_cron
GLPI_DUMP_DIR: C:\inetpub\wwwroot\glpi10x/files/_dumps
GLPI_GRAPH_DIR: C:\inetpub\wwwroot\glpi10x/files/_graphs
GLPI_LOCAL_I18N_DIR: C:\inetpub\wwwroot\glpi10x/files/_locales
GLPI_LOCK_DIR: C:\inetpub\wwwroot\glpi10x/files/_lock
GLPI_LOG_DIR: C:\inetpub\wwwroot\glpi10x/files/_log
GLPI_PICTURE_DIR: C:\inetpub\wwwroot\glpi10x/files/_pictures
GLPI_PLUGIN_DOC_DIR: C:\inetpub\wwwroot\glpi10x/files/_plugins
GLPI_RSS_DIR: C:\inetpub\wwwroot\glpi10x/files/_rss
GLPI_SESSION_DIR: C:\inetpub\wwwroot\glpi10x/files/_sessions
GLPI_TMP_DIR: C:\inetpub\wwwroot\glpi10x/files/_tmp
GLPI_UPLOAD_DIR: C:\inetpub\wwwroot\glpi10x/files/_uploads
GLPI_INVENTORY_DIR: C:\inetpub\wwwroot\glpi10x/files/_inventories
GLPI_NETWORK_REGISTRATION_API_URL: https://services.glpi-network.com/api/registration/
GLPI_MARKETPLACE_PLUGINS_API_URI: https://services.glpi-network.com/api/glpi-plugins/
GLPI_I18N_DIR: C:\inetpub\wwwroot\glpi10x/locales
GLPI_VERSION: 10.0.0-rc3
GLPI_SCHEMA_VERSION: 10.0.0-rc3
GLPI_MARKETPLACE_PRERELEASES: 1
GLPI_MIN_PHP: 7.4.0
GLPI_MAX_PHP: 8.2.0
GLPI_YEAR: 2022

Libraries
 
htmlawed/htmlawed version 1.2.6 in (C:\inetpub\wwwroot\glpi10x\vendor\htmlawed\htmlawed)
phpmailer/phpmailer version 6.6.0 in (C:\inetpub\wwwroot\glpi10x\vendor\phpmailer\phpmailer\src)
simplepie/simplepie version 1.5.8 in (C:\inetpub\wwwroot\glpi10x\vendor\simplepie\simplepie\library)
mpdf/mpdf in (C:\inetpub\wwwroot\glpi10x\vendor\mpdf\mpdf\src)
michelf/php-markdown in (C:\inetpub\wwwroot\glpi10x\vendor\michelf\php-markdown\Michelf)
true/punycode in (C:\inetpub\wwwroot\glpi10x\vendor\true\punycode\src)
iamcal/lib_autolink in (C:\inetpub\wwwroot\glpi10x\vendor\iamcal\lib_autolink)
sabre/dav in (C:\inetpub\wwwroot\glpi10x\vendor\sabre\dav\lib\DAV)
sabre/http in (C:\inetpub\wwwroot\glpi10x\vendor\sabre\http\lib)
sabre/uri in (C:\inetpub\wwwroot\glpi10x\vendor\sabre\uri\lib)
sabre/vobject in (C:\inetpub\wwwroot\glpi10x\vendor\sabre\vobject\lib)
laminas/laminas-i18n in (C:\inetpub\wwwroot\glpi10x\vendor\laminas\laminas-i18n\src)
laminas/laminas-servicemanager in (C:\inetpub\wwwroot\glpi10x\vendor\laminas\laminas-servicemanager\src)
monolog/monolog in (C:\inetpub\wwwroot\glpi10x\vendor\monolog\monolog\src\Monolog)
sebastian/diff in (C:\inetpub\wwwroot\glpi10x\vendor\sebastian\diff\src)
donatj/phpuseragentparser in (C:\inetpub\wwwroot\glpi10x\vendor\donatj\phpuseragentparser\src\UserAgent)
elvanto/litemoji in (C:\inetpub\wwwroot\glpi10x\vendor\elvanto\litemoji\src)
symfony/console in (C:\inetpub\wwwroot\glpi10x\vendor\symfony\console)
scssphp/scssphp in (C:\inetpub\wwwroot\glpi10x\vendor\scssphp\scssphp\src)
laminas/laminas-mail in (C:\inetpub\wwwroot\glpi10x\vendor\laminas\laminas-mail\src\Protocol)
laminas/laminas-mime in (C:\inetpub\wwwroot\glpi10x\vendor\laminas\laminas-mime\src)
rlanvin/php-rrule in (C:\inetpub\wwwroot\glpi10x\vendor\rlanvin\php-rrule\src)
blueimp/jquery-file-upload in (C:\inetpub\wwwroot\glpi10x\vendor\blueimp\jquery-file-upload\server\php)
ramsey/uuid in (C:\inetpub\wwwroot\glpi10x\vendor\ramsey\uuid\src)
psr/log in (C:\inetpub\wwwroot\glpi10x\vendor\psr\log\Psr\Log)
psr/simple-cache in (C:\inetpub\wwwroot\glpi10x\vendor\psr\simple-cache\src)
psr/cache in (C:\inetpub\wwwroot\glpi10x\vendor\psr\cache\src)
league/csv in (C:\inetpub\wwwroot\glpi10x\vendor\league\csv\src)
mexitek/phpcolors in (C:\inetpub\wwwroot\glpi10x\vendor\mexitek\phpcolors\src\Mexitek\PHPColors)
guzzlehttp/guzzle in (C:\inetpub\wwwroot\glpi10x\vendor\guzzlehttp\guzzle\src)
guzzlehttp/psr7 in (C:\inetpub\wwwroot\glpi10x\vendor\guzzlehttp\psr7\src)
glpi-project/inventory_format in (C:\inetpub\wwwroot\glpi10x\vendor\glpi-project\inventory_format\lib\php)
wapmorgan/unified-archive in (C:\inetpub\wwwroot\glpi10x\vendor\wapmorgan\unified-archive\src)
paragonie/sodium_compat in (C:\inetpub\wwwroot\glpi10x\vendor\paragonie\sodium_compat\src)
symfony/cache in (C:\inetpub\wwwroot\glpi10x\vendor\symfony\cache)
html2text/html2text in (C:\inetpub\wwwroot\glpi10x\vendor\html2text\html2text\src)
symfony/dom-crawler in (C:\inetpub\wwwroot\glpi10x\vendor\symfony\dom-crawler)
twig/twig in (C:\inetpub\wwwroot\glpi10x\vendor\twig\twig\src)
twig/string-extra in (C:\inetpub\wwwroot\glpi10x\vendor\twig\string-extra)
symfony/polyfill-ctype not found
symfony/polyfill-iconv not found
symfony/polyfill-mbstring not found
symfony/polyfill-php80 in (C:\inetpub\wwwroot\glpi10x\vendor\symfony\polyfill-php80)

LDAP directories
 
Server: 'ldap://empcruukwpdc001.navsys.ad', Port: '636', BaseDN: 'DC=NAVSYS,DC=ad', Connection filter:
        '(&(objectClass=user)(objectCategory=person)(memberof=CN=_ActiveUsers,OU=Groups,OU=Naval,DC=NAVSYS,DC=ad)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))',
        RootDN: 'ldapnex@navsys.ad', Use TLS: none

SQL replicas
 
Not active

Notifications
 
Way of sending emails: SMTP (anonymous@empcruukwpap020.navsys.ad)

Plugins list
 
    activity             Name: Activities                     Version: 3.0.0      State: Error / to clean
    actualtime           Name: ActualTime                     Version: 1.5.0      State: Error / to clean
    fields               Name: Additionnal fields             Version: 1.12.4     State: Error / to clean
    advancedplanning     Name: advancedplanning               Version: 1.0.0      State: Error / to clean
    news                 Name: Alerts                         Version: 1.9.0      State: Error / to clean
    appliances           Name: Appliances                     Version: 3.1.1      State: Error / to clean
    archisw              Name: Apps structures                Version: 2.2.9      State: Enabled
    impacts              Name: Asset impacts                  Version: 2.0.5      State: Error / to clean
    behaviors            Name: Behaviours                     Version: 2.5.0      State: Error / to clean
    archibp              Name: Business Processes             Version: 1.0.3      State: Error / to clean
    positions            Name: Cartography                    Version: 5.0.0      State: Error / to clean
    cmdb                 Name: CMDB                           Version: 2.2.1      State: Error / to clean
    cve                  Name: CVE                            Version: 1.0.0      State: Error / to clean
    dashboard            Name: Dashboard                      Version: 1.0.2      State: Error / to clean
    datainjection        Name: Data injection                 Version: 2.10.1     State: Error / to clean
    databases            Name: Databases                      Version: 2.3.2      State: Error / to clean
    archimap             Name: Diagrams                       Version: 3.2.2      State: Installed / not activated
    father               Name: Father&Sons                    Version: 1.2.0      State: Error / to clean
    formcreator          Name: Form Creator                   Version: 2.11.2     State: Error / to clean
    archifun             Name: Functional Areas               Version: 2.2.2      State: Error / to clean
    fusioninventory      Name: FusionInventory                Version: 9.5+3.0    State: Error / to clean
    resources            Name: Human Resources                Version: 2.7.4      State: Error / to clean
    geninventorynumber   Name: Inventory number generation    Version: 2.5.0      State: Error / to clean
    addressing           Name: IP Adressing                   Version: 2.9.0      State: Error / to clean
    uninstall            Name: Item's Lifecycle (uninstall)   Version: 2.7.0      State: Error / to clean
    itilcategorygroups   Name: ItilCategory Groups            Version: 2.4.1      State: Error / to clean
    ldapcomputers        Name: LDAP computers                 Version: 0.4.1      State: Error / to clean
    metademands          Name: Meta-Demands                   Version: 2.7.5      State: Error / to clean
    metabase             Name: metabase                       Version: 1.2.4      State: Error / to clean
    mreporting           Name: More Reporting                 Version: 1.7.3      State: Error / to clean
    genericobject        Name: Objects management             Version: 2.11.0     State: Error / to clean
    order                Name: Orders management              Version: 2.7.2      State: Error / to clean
    example              Name: Plugin Example                 Version: 0.0.1      State: Error / to clean
    pdf                  Name: Print to pdf                   Version: 2.0.0      State: Error / to clean
    projectbridge        Name: Projectbridge                  Version: 2.4        State: Error / to clean
    releases             Name: Releases                       Version: 1.0.1      State: Error / to clean
    reports              Name: Reports                        Version: 1.14.1     State: Error / to clean
    shellcommands        Name: Shell Commands                 Version: 3.0.0      State: Error / to clean
    tag                  Name: Tag Management                 Version: 2.8.1      State: Error / to clean
    taskdrop             Name: TaskDrop                       Version: 1.3.0      State: Error / to clean
    tasklists            Name: Tasks list                     Version: 1.6.2      State: Error / to clean
    useditemsexport      Name: Used items export              Version: 2.3.0      State: Error / to clean
    vip                  Name: VIP                            Version: 1.4RC1     State: Error / to clean
    webapplications      Name: Web applications               Version: 4.0.0-rc1  State: Enabled
    webresources         Name: Web Resources                  Version: 1.3.2      State: Error / to clean

Anything else?

No response

derry43 commented 2 years ago

After a bit of digging, this issue appears to be cause by lines 1438 to 1448 in src/KnowbaseItem.php which replaces all of the Boolean Full-Text Search (BOOLEAN MODE) operators with spaces, resulting in them being ignored.

                   // Replace all non word characters with spaces (see: https://stackoverflow.com/a/26537463)
                    $search_wilcard = preg_replace('/[^\p{L}\p{N}_]+/u', ' ', $search);

                   // Remove last space to avoid illegal syntax with " *"
                    $search_wilcard = trim($search_wilcard);

                   // Merge spaces since we are using them to split the string later
                    $search_wilcard = preg_replace('!\s+!', ' ', $search_wilcard);

                    $search_wilcard = explode(' ', $search_wilcard);
                    $search_wilcard = implode('* ', $search_wilcard) . '*';

I believe that this should be:

                   // Replace all non word/operator characters with spaces (see: https://stackoverflow.com/a/26537463)
                    $search_wilcard = preg_replace('/[^\p{L}\p{N}_)(~+\<>*"-]+/u', ' ', $search);

                   // Remove last space to avoid illegal syntax with " *"
                    $search_wilcard = trim($search_wilcard);

                   // Merge spaces since we are using them to split the string later
                    $search_wilcard = preg_replace('!\s+!', ' ', $search_wilcard);

                    $search_wilcard = explode(' ', $search_wilcard);
                    $search_wilcard = implode('* ', $search_wilcard);

searchdiff

Similar lines in 9.5.7 inc/knowbaseitem_class.php can be updated to fix for current release.

derry43 commented 2 years ago

We also need to account for url encoding for the operators in the pager code between lines 1685 and 1687 in src/KnowbaseItem.php, where ...

            $parameters = "start=" . $params["start"] . "&amp;knowbaseitemcategories_id=" .
                        $params['knowbaseitemcategories_id'] . "&amp;contains=" .
                        $params["contains"] . "&amp;is_faq=" . $params['faq'];

becomes ...

            $parameters = "start=" . $params["start"] . "&amp;knowbaseitemcategories_id=" .
                        $params['knowbaseitemcategories_id'] . "&amp;contains=" .
                        rawurlencode($params["contains"]) . "&amp;is_faq=" . $params['faq'];

Again, Similar lines in 9.5.7 file inc/knowbaseitem_class.php can be updated to fix for current release.

cconard96 commented 2 years ago

The proposed change seems to cause the initial issue to return that #9784 was trying to fix. I also cannot get a query like +Convert -Windows to work correctly as it shows an article named Convert Windows Server from Desktop to Core.

There seems to be more wrong here. The resulting query seems to correctly select a "score" for the text search but then in the WHERE statement it looks for any records with the title or content containing Convert Windows (to use my previous example). If using boolean mode operators, it shouldn't have those clauses and should enforce that only results with a score more than 0 are returned. If not using boolean operators, then the existing behavior is OK as the fulltext search score should only be relevant for the ORDER to show the most relevant articles first.

derry43 commented 2 years ago

Took another look at this, and you are correct there is still an issue with how the boolean full-text search operators are handled. I have done a little more work on this and, although what I have produced does not handle correcting all syntax issues, hopefully it cleans up the majority of offending syntax and allows the search operators (including grouping and quoting) to be used successfully. As a bonus, this allows searching for other "special" characters, such as !, as long as they are included in double quotes e.g. "!error".

Here is what I propose:

Original code:

                   // Replace all non word characters with spaces (see: https://stackoverflow.com/a/26537463)
                    $search_wilcard = preg_replace('/[^\p{L}\p{N}_]+/u', ' ', $search);

                   // Remove last space to avoid illegal syntax with " *"
                    $search_wilcard = trim($search_wilcard);

                   // Merge spaces since we are using them to split the string later
                    $search_wilcard = preg_replace('!\s+!', ' ', $search_wilcard);

                    $search_wilcard = explode(' ', $search_wilcard);
                    $search_wilcard = implode('* ', $search_wilcard) . '*';

New code:

           // Replace all non word/operator characters with spaces (see: https://stackoverflow.com/a/26537463), ignore those inside quotes
            $search_wilcard = preg_replace('/[^\p{L}\p{N}_)(~+\<>*\\\\"-]+(?=([^"]*"[^"]*")*[^"]*$)/u', ' ', $search);
           // Remove escape character from quotes
            $search_wilcard = preg_replace('![\\\\]["]!u', '"', $search_wilcard);
           // Remove all isolated groups of operators, ignore those inside quotes
           // i.e. those operators which are neither a prefix nor a suffix for a word, group or phrase
            $search_wilcard = preg_replace('!(?<=\s)[~\<>*+-]+(?=\s)(?=([^"]*"[^"]*")*[^"]*$)!u', '', $search_wilcard);
           // Remove all * operators immediately following parentheses or quoted terms, ignore those inside quotes
            $search_wilcard = preg_replace('!(?<=[)("])[*](?=([^"]*"[^"]*")*[^"]*$)!u', '', $search_wilcard);
           // Insert space between closing parentheses and immediately following operator, ignore those inside quotes
            $search_wilcard = preg_replace('!(?<=[)])([~\<>+-])(?=([^"]*"[^"]*")*[^"]*$)!u', ' \1', $search_wilcard);
           // Replace any * operators immediately following a space, ignore those inside quotes
            $search_wilcard = preg_replace('!(?<=\s)[*](?=([^"]*"[^"]*")*[^"]*$)!u', ' ', $search_wilcard);
           // Replace groups of ~, >, <, + or - operators with rightmost operator, ignore those inside quotes
            $search_wilcard = preg_replace('![~\<>+-]+(?=[~\<>+-])(?=([^"]*"[^"]*")*[^"]*$)!u', '', $search_wilcard);
           // Merge spaces, ignore those inside quotes
            $search_wilcard = preg_replace('!\s+(?=([^"]*"[^"]*")*[^"]*$)!u', ' ', $search_wilcard);

The above handles the search term test* /* test* (* test3* )* (initial fix attempt in #9784) and will effectively tidy this up to be test* test* ( test3* ). It also handles quite a few other syntactically incorrect searches, such as ~+test which becomes +test, *test which becomes test, and *+ etc.

One problem still remains, handling of > and < characters in the search. The sanitization of > generates this &#38;#62; which un-sanitizes as &#62; which is incorrect. The sanitization of < generates this &#38;#60; which un-sanitizes as &#60; which is also incorrect. Looks like the code in src/Toolbox/Sanitizer.php which handles the sanitization of the original search term appears to convert the > to &#62; and then convert the & of the &#62; to &#38;, resulting in something that is irreversible.

derry43 commented 2 years ago

To detect and alert on syntax errors for boolean full-text search, the following function can be added to src/KnowbaseItem.php and the showList function updated at line 1688 to include a call to the checkBoolFtSyntax function.

    /**
     * Check Boolean Full-Text Search Syntax
     *
     * @param $searchterm       raw text version of search term e.g. "+word -search"
     * @param $pos              return the position of first syntax issue, default 0
     **/
    public static function checkBoolFtSyntax($searchterm, &$pos = 0)
    {
         $matches = null;
         $returnValue = preg_match_all('/\s*([~+-]?\(\s*((?>(\s*[><~+-]?([A-Za-z_]+|"[^\\"]*(\\"[^\\"]*)*")[*]?(\s|$|(?=[)])))+)\s*|\s*(?R)\s*)*\s*\)|(?>(\s*[><~+-]?([A-Za-z_]+|"[^\\"]*(\\"[^\\"]*)*")[*]?(\s|$))+))\s*/', $searchterm, $matches);
         foreach ($matches as $key => $value)
         {
            if (is_array($value)) {
                foreach ($value as $ikey => $ivalue) {
                    $searchterm=str_replace($ivalue,str_repeat(" ",strlen($ivalue)),$searchterm);
                }
            } else {
                $searchterm=str_replace($value,str_repeat(" ",strlen($value)),$searchterm);
            }            
         } 
         if (preg_match('/\S/',$searchterm,$matches)) {
             $pos=strpos($searchterm,ltrim($searchterm));
             return true;
         } else {
            return false;
         }
    }
            if (self::checkBoolFtSyntax(Sanitizer::unsanitize($params["contains"]), $errorpos)) {
                $errorchar=substr($params["contains"],$errorpos,10);
                echo '<div style="color:red;" class="d-flex justify-content-center">'.__('Search syntax error at ').$errorpos.", '".$errorchar."...'</div><br/><br/>";
                return false;
            }

Again, Similar lines in 9.5.7 file inc/knowbaseitem_class.php can be updated to fix for current release.

cconard96 commented 2 years ago

Can you open a pull request with your changes?

derry43 commented 2 years ago

Can you open a pull request with your changes?

Yes will try. Can you advise on the correct way to generate the error message for the search term syntax check above. Not sure if the method I used is really ideal and can't seem to use Session::addMessageAfterRedirect as there is no-redirect after the check and so this does not show until after a further search.

cconard96 commented 2 years ago

addMessageAfterRedirect should work since searching the knowledgebase refreshes the page. The name of the function is a bit wrong as it adds a message that will get displayed the next time any page is loaded.

If you open a PR with all the changes, I can test and suggest changes.

cconard96 commented 2 years ago

Hello @derry43, were you able to combine your patches for a pull request? If not, could you let me know which changes from this thread as still relevant so we have a starting point?

derry43 commented 2 years ago

Hello @derry43, were you able to combine your patches for a pull request? If not, could you let me know which changes from this thread as still relevant so we have a starting point?

Eventually managed to create a PR, not able to perform tests as I do not have a full grasp of that part of development here.

cedric-anne commented 1 year ago

Fixed by #14301.