go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.38k stars 5.52k forks source link

Elastic Search Issue Indexer fuzzy search #11977

Open WegnerDan opened 4 years ago

WegnerDan commented 4 years ago

Description

I have set up elasticsearch with the following settings:

[indexer]
REPO_INDEXER_ENABLED = true
ISSUE_INDEXER_TYPE: elasticsearch
ISSUE_INDEXER_CONN_STR: http://localhost:9200
ISSUE_INDEXER_NAME: gitea_issues

I have created a test issue with the text "bla bla bla mr. freeman" and I am trying to find it using the issue search. I've done the same thing on the try.gitea.io test instance:

Issue: https://try.gitea.io/thedoginthewok/test_issue_search/issues/1 Search: https://try.gitea.io/issues?type=your_repositories&repos=%5B%5D&sort=&state=open&q=freema

On the test instance, the issue is successfully found. On my instance, I can only find the instance if I search for the complete word freeman.

Is there any way to configure a fuzzy search for elastic?

Screenshots

My instance with search term freeman: grafik

My instance with search term freema: grafik

lunny commented 4 years ago

Maybe you mean

[indexer]
REPO_INDEXER_ENABLED = true
ISSUE_INDEXER_TYPE = elasticsearch
ISSUE_INDEXER_CONN_STR = http://localhost:9200
ISSUE_INDEXER_NAME = gitea_issues
WegnerDan commented 4 years ago

I've changed it to "=", but it behaves the same.

New log gist: https://gist.github.com/thedoginthewok/eaa51d81d8a82f13145ff7be1c56888b

This part is interesting to me:

2020/06/19 15:25:41 ...elastic/v7/client.go:848:dumpRequest() [T] POST /gitea_issues/_search HTTP/1.1\01503d
    Host: localhost:9200\01503d
    User-Agent: elastic/7.0.9 (linux-amd64)\01503d
    Transfer-Encoding: chunked\01503d
    Accept: application/json\01503d
    Content-Type: application/json\01503d
    Accept-Encoding: gzip\01503d
    \01503d
    b5\01503d
    {"from":0,"query":{"bool":{"must":[{"multi_match":{"fields":["title","content","comments"],"query":"freema"}},{"terms":{"repo_id":[1]}}]}},"size":50,"sort":[{"id":{"order":"asc"}}]}\01503d
    0\01503d
    \01503d

2020/06/19 15:25:41 ...elastic/v7/client.go:858:dumpResponse() [T] HTTP/1.1 200 OK\01503d
    Content-Type: application/json; charset=UTF-8\01503d
    \01503d
    {"took":4,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}
lafriks commented 4 years ago

Try searching freema*

WegnerDan commented 4 years ago

grafik

Nope. What is the try.gitea.io instance running on?

lafriks commented 4 years ago

It just uses database search

WegnerDan commented 4 years ago

So, probably with LIKE '%SEARCHTERM%'.

This is a bug in the elasticsearch indexer, right? Or is it supposed to work this way?

lafriks commented 4 years ago

Elastic search query should be improved

lunny commented 4 years ago

That's because how we use elastic search. Below is the configuration from the source,

"mappings": {
            "properties": {
                "id": {
                    "type": "integer",
                    "index": true
                },
                "repo_id": {
                    "type": "integer",
                    "index": true
                },
                "title": {
                    "type": "text",
                    "index": true
                },
                "content": {
                    "type": "text",
                    "index": true
                },
                "comments": {
                    "type" : "text",
                    "index": true
                }
            }
        }

We should change the configuration to resolve the problem?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because of inactivity. You can re-open it if needed.

WegnerDan commented 4 years ago

unstale