TYPO3-Solr / ext-solr

A TYPO3 extension that integrates the Apache Solr search server with TYPO3 CMS. dkd Internet Service GmbH is developing the extension. Community contributions are welcome. See CONTRIBUTING.md for details.
GNU General Public License v3.0
136 stars 249 forks source link

[FEATURE] autosuggest by words containing hyphens and other special chars #4170

Open dkoether opened 4 weeks ago

dkoether commented 4 weeks ago

Describe the bug When using special characters like hyphens in the search term field no autosuggest option is provided anymore.

To Reproduce I have several pages with several tags like "Digital-Abo" or "Digital-Ausgabe" in a Solr index.

Steps to reproduce the behavior:

  1. I start to input my search word with the first two letters "Di"
  2. I see several autosuggest options like 'Digital', 'Digital-Ausgabe', 'Digital-Abo".
  3. I enter more characters and also a hyphen "Digital-" and not a single autosuggest option is provided anymore.

Terms with whitespaces are working as expected.

Expected behavior Special characters should also be working in autosuggest feature by default.

Used versions (please complete the following information):

By default the autosuggest feature uses the field spell. I tried different field types like stringM, textM or textSpellM as there are multiple values but $results->facet_counts->facet_fields->{$suggestConfig['suggestField']} is always an empty array.

Thank you in advance! Best regards!

dkd-kaehm commented 4 weeks ago

EXT:solrs suggest-service uses "facet"-approach. The spell and other text-fields using <tokenizer class="solr.StandardTokenizerFactory"/> do not fit that case. See: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#standard-tokenizer

There is a field wanted which combines following on index analyzer-config:

  1. <tokenizer class="solr.ClassicTokenizerFactory"/> See: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#classic-tokenizer 1.1. (Optional) Synonyms + Stop filter. If applied, then field is language dependent.
  2. <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="32" preserveOriginal="true"/> See: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#edge-n-gram-tokenizer

http://solr-site:8983/solr/core_en/select?omitHeader=true&facet=on&facet.prefix=pre-&facet.field=spell&facet.limit=10&facet.mincount=1&facet.method=enum&wt=json&json.nl=flat&q=&start=0&rows=10&fl=spell&fq=siteHash:"2ddf3ad239669e6c3e3110228186b1a92f9648a8"&fq={!typo3access}-1,0&defType=edismax&q.alt=:


Please add that field via pull-request. I'll change the tracker from bug to feature.