go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
43.56k stars 5.35k forks source link

Search Functionality Issues with Bleeve Engine #31565

Open amix307 opened 1 month ago

amix307 commented 1 month ago

Description

Hi,

I'm following up on my previous ticket: https://github.com/go-gitea/gitea/issues/30064. I always keep the service updated to the latest version, and the search problem persists even on version 1.22.1. I'm using the inbuilt Bleeve engine.

Here’s another example of the issue: When searching for the string "services.gradle.org", there is one file in the repository where this string should be found. However, the Exact search method does not find the result, and the Fuzzy search hangs for about 2-3 minutes without finding anything.

To give more context about my instance: I have about 300 organizations and 3000 repositories, but overall the size is small as I don't have heavy files. I know for sure that there are many occurrences of the search string across default branches, likely several hundred. Despite this, the Exact search method returns only about 10 results and does so instantly. The Fuzzy search, however, hangs indefinitely. On version 1.21.11, I even encountered a 500 error and the Gitea service restarted.

Manual re-indexing does not help. I would like to resolve the search issues and have more transparent ways to understand how the code is indexed and to have more flexible control over Bleeve settings.

Thanks in advance for your help. I've attached some screenshots for reference.

Gitea Version

1.22.1

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

2024-07-05_11-35-26 2024-07-05_11-37-40 2024-07-05_11-38-37 2024-07-05_11-41-55

Git Version

2.31.1

Operating System

CentOS Stream 9

How are you running Gitea?

Self-Hosted from dl.gitea.org

Database

PostgreSQL

silverwind commented 1 month ago

the Fuzzy search hangs for about 2-3 minutes without finding anything.

I also noticed this search hanging starting with v1.22. If kept running, gitea continously consumes more and more memory up until the point where it exhausts the system resources. It's only for certain search terms, not all of them.

amix307 commented 1 month ago

the Fuzzy search hangs for about 2-3 minutes without finding anything.

I also noticed this search hanging starting with v1.22. If kept running, gitea continously consumes more and more memory up until the point where it exhausts the system resources. It's only for certain search terms, not all of them.

Yes, yesterday my instance hangs and service gitea was restarted with OOM on VM 64gig ram)

MICCustomsSolutions commented 1 month ago

Same issue here. Eats all the RAM. Disabling the index fixes the issue.

carobme commented 4 weeks ago

I can confirm this issue. Using the fuzzy code search with v1.22.1 (self-hosted) results in Gitea memory usage growing until the process runs out of memory.

Rebuilding the index (by stopping Gitea, rm -rf indexers/* and starting again) doesn't make a difference.

techknowlogick commented 4 weeks ago

Pinging @6543

makar112233 commented 4 weeks ago

We're also had same issue

smartEBL commented 3 weeks ago

We are affected by this issue as well. A temporary fix for us is changing

[indexer]
REPO_INDEXER_ENABLED = true

to

[indexer]
REPO_INDEXER_ENABLED = false

Code search on single repository still works then, global search does not (of course). So it would be really nice to see that fixed. Is there anything we could provide for debugging?

amix307 commented 2 weeks ago

@smartEBL now i switched to single-node elasticsearch in container at same vm and it works good, but have same issues with non transparent index mechanism (not searching all)

nekdan commented 2 weeks ago

We also encountered this issue and disabled the search functionality to ensure normal operation. However, this is a poor solution because we need the search functionality for our work.

lunny commented 2 weeks ago

Can anyone reproduce it in the development version? I think it's related to bleve versions.

kemzeb commented 2 weeks ago

Can anyone reproduce it in the development version? I think it's related to bleve versions.

I am not yet familiar with how the code search indexing works nor am I familiar with bleve, but this could be the case after looking into the recent bleve releases.

In Gitea v1.21 we use bleve v2.30.10. Gitea v1.22 and onwards uses bleve v2.4.0.

bleve v2.4.1 adds a fix to a memory leak problem associated to their "vector query path" (more specifically, it was a problem in a indirect dependency called blevesearch/go-faiss).

I am not sure if this "path" is something that our code will eventually execute (or if it is only used by the vector indexing feature introduced in bleve v2.4.0), but I wish to bring this up to those that are maybe more familiar.

silverwind commented 2 weeks ago

According to go.mod, gitea v1.22 still uses bleve v2.3.10, so exact same version as v1.21:

https://github.com/go-gitea/gitea/blob/release/v1.22/go.mod#L22 https://github.com/go-gitea/gitea/blob/release/v1.21/go.mod#L21

I think the issue must lie in first-party code.

kemzeb commented 1 week ago

After doing the following when firing up my dev Gitea instance:

I was able to notice the huge memory cost. Here is my generated pprof graph for reference.

I also checked out snapshot 1262ff6734543b37d834e63a6a623648c77ee4f4 (as this was before major changes were made to code search fuzzing) and I did not notice a performance impact on memory when observing my heap usage in the admin dashboard.

Don't have time to dig into this further yet, but thought this could be helpful in some way.