TYPO3-Documentation / sphinx_typo3_theme

Sphinx theme for docs.typo3.org
https://typo3-documentation.github.io/sphinx_typo3_theme
MIT License
29 stars 18 forks source link

Search term suggestions do sometimes not include valid results #118

Closed jonaseberle closed 3 years ago

jonaseberle commented 3 years ago

I am not sure what can be done about it. Maybe we can tweak the indexing/autosuggestion logic, maybe we should change texts to facilitate indexing but for that we need some data.

I'd like to gather examples where search terms are not indexed/suggested as expected.

1)

marble commented 3 years ago

I'm afraid I don't understand what you want to say. Searching for sys_language_content (note the blank at the end) tells me that there are exactly 4 pages that contain the search word one or multiple times. Looks like a very useful information to me.

jonaseberle commented 3 years ago

But searching for sys_language_content is not straight-forward. How did you do it? Or rather: Why is sys_language_content not autosuggested?

marble commented 3 years ago

As I tried to point out: The indexer is a "stemmer", it indexes only the word stem (Nur den Wortstamm). So it stores 'complet' instead of 'completion' and 'compli' instead of 'compliance'. So 'sys_language_content' is not in the database, only 'sys_language_cont'.

jonaseberle commented 3 years ago

This is unwanted, though in that case. The result is unhelpful. Maybe there are other examples so we can get a picture about if we want to stem technical terms.

sypets commented 3 years ago

As I tried to point out: The indexer is a "stemmer", it indexes only the word stem (Nur den Wortstamm). So it stores 'complet' instead of 'completion' and 'compli' instead of 'compliance'. So 'sys_language_content' is not in the database, only 'sys_language_cont'.

Ok, to just be clear why this is done? Is it done so you have 1 stem for several words of one "family" of words with the same stem. For example, when you search for student, you also get the results student, students, etc. That makes sense and that is common in search engines and gets you more results for several similar words. In addition to this, this is also done with synonyms (so you get results for other words with same meaning).

(On a side node: stemming is a little crude (as done automatically), lemmatization would be better, but not easily available).

e.g. search for comple:

Results

I guess complet is the stem for completion and complete and completely. So you get results with all these. This is nice. (But not always wanted).

I would assume in a lot of cases, it works great. But in some, is is confusing and it does not get you useful results as pointed out in #116 (where it actually seems to give you a wrong result, see my last comment)

Personally, I can live with this if this is an edge case and the current autosuggest is a benefit in most cases.

However, having rarely used the search, I can't really say.

I wouldn't put too much effort into the search anyway if it will be replaced with the global search engine.

jonaseberle commented 3 years ago

2.

marble commented 3 years ago

@jonaseberle I'm not sure what you are looking for. Note that his manual has not be newly rendered since the latest Docker Rendering Container (DRC) v2.8.3 has been released. But even with the newest one there won't be any changed. The search database, containing the word stems, only contains uribuild. And as long as you don't type more the autosuggestion is there: 026

If you type more than what's in the database of wordstems then there is no suggested match: image

Note that I change the message in that case, once rerendered it will be "Not found in word stems" to make things clearer. Here's an example for the new text: image

But, most important: Whether you search on 'uribuild' or 'uribuilder' - you WILL get at least all 'uribuilder' hits. In contrast to Sybille's example there is no alternative entry in the list that is falsely seleceted and leads to irritating results.

And about @sypets questions, why the search is working the way it is with wordstems - ask the Sphinx people. If you want to add to that project, send them a pull request.

jonaseberle commented 3 years ago

I guess we can close here since the search will be replaced soon.