cubecart / v6

CubeCart Version 6
https://cubecart.com
72 stars 59 forks source link

Code Check: Search String Split on Space, and Comma? #3446

Open bhsmither opened 9 months ago

bhsmither commented 9 months ago

In CubeCart->_category(), there is:

840: $keys = explode(' ', $_REQUEST['search']['keywords']);

However, in Catalogue->searchCatalogue(), there is:

1995: $words = explode(' ', $search_data['keywords']);
2050: $searchwords = preg_split('/[\s,]+/', $GLOBALS['db']->sqlSafe($search_data['keywords']));

Please check developer notes to determine if search terms are to be split also on commas.

Does Elasticsearch parse through the keywords phrase and automatically break search terms into words on spaces (I would hope so) - and on commas?

For line 1995, would the phrase "no,on" present a false-positive on having a long-enough term? Would the phrase "one, two" (note the comma makes the first term four characters) also present a false-positive on having a long-enough term?

For line 840, is the logging of these examples of terms desirable?

bhsmither commented 9 months ago

After some research, other than what CubeCart wants to do with the search strings, MySQL (thus, presumably Elasticsearch) looks at word 'characters' only (A-Za-z0-9_ and sometimes an apostrophe), and then treats certain characters as word 'delimiters' (space, period, comma).

As such, it must be the case that MySQL will drop no,on as being two words but each is too short or a stop word. MySQL would also handle the other examples appropriately. See:

https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html

But back to what CubeCart wants to do with the search strings, and whether to treat a comma (and perhaps a period) as a word delimiter is still something to look at.