codelibs / fess

Fess is very powerful and easily deployable Enterprise Search Server.
https://fess.codelibs.org
Apache License 2.0
1k stars 165 forks source link

Synonym Filter "expand" option. #1379

Closed anatomo closed 6 years ago

anatomo commented 6 years ago

Hello. I have a question about Synonym List.

I uploaded synonym.txt.

A=>B
C=>B

When I search "A", the keyword "C" is hit . (The "A" is not similer to "C".) Maybe as a way to fix it, Synonym Filter set "expand=false" option. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html

Could you please advise how to do it?

marevol commented 6 years ago

1) Modify app/WEB-INF/classes/fess_indices/fess.json 2) In Upgrade page, start Reindex with updating aliases

anatomo commented 6 years ago

Thanks your advice.

I Modified app/WEB-INF/classes/fess_indices/fess.json. (Added "expand:false" to synonym tokeizer.)

    "tokenizer": {
        "japanese_tokenizer": {
          "type": "fess_japanese_reloadable_tokenizer",
          "mode": "normal",
          "user_dictionary": "${fess.dictionary.path}ja/kuromoji.txt",
          "discard_punctuation": false,
          "reload_interval":"1m"
        },
        "korean_tokenizer": {
            "type": "fess_korean_tokenizer",
            "index_eojeol": false,
            "pos_tagging": false,
            "user_dict_path": "${fess.dictionary.path}ko/seunjeon.txt"
        },
        "simplified_chinese_tokenizer": {
            "type": "fess_simplified_chinese_tokenizer"
        },
        "vietnamese_tokenizer": {
            "type": "fess_vietnamese_tokenizer",
            "sentence_detector": false,
            "ambiguities_resolved": false
        },
        "unigram_synonym_tokenizer": {
          "type": "ngram_synonym",
          "n": "1",
          "synonyms_path": "${fess.dictionary.path}synonym.txt",
          "expand":false,
          "dynamic_reload":true,
          "reload_interval":"1m"
        },
        "bigram_synonym_tokenizer": {
          "type": "ngram_synonym",
          "n": "2",
          "synonyms_path": "${fess.dictionary.path}synonym.txt",
          "expand":false,
          "dynamic_reload":true,
          "reload_interval":"1m"
        }
    },

And I start Reindex with updating aliases. After reindex, I search "A". But, the Keyword "B" and "C" isn't hit. (Just Keyword A is hit.)

Is this setting wrong?

marevol commented 6 years ago

I think what you want to do is:

B=>A,C
anatomo commented 6 years ago

Thank you, it works. But, I have already created wrong format synonym.txt. (It has about 250,000 words.)

Umm... I will try to fix the synonym.txt format.