go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44k stars 5.4k forks source link

Support search qualifiers #8386

Open davidsvantesson opened 4 years ago

davidsvantesson commented 4 years ago

Description

Gitea should support search qualifiers when searching for repositories, issues, PRs or users. This would allow more flexible options when searching.

Example when searching for repos: topic: is:private is:public owner:name

Search qualifiers shall always be AND search terms (in contrast to text search which is OR).

Note: Gitea has divided search terms by comma and not space so this example: "sentence of words,topic:mytopic,separate sentance" would search for repositories where "sentence of words" OR "separate sentance" is in the name or description AND has the topic "mytopic".

Maybe this can be solved by indexing with bleve and using required and exclusion of fields: https://blevesearch.com/docs/Query-String-Query/

guillep2k commented 4 years ago
* Gitea version (or commit ref): master

Gitea version (or commit ref): 1.10.0+dev-375-g8a828500e

You can use the footer from try.gitea.io. I know this is nit-picky of me, but master is a relative term. :grin:

davidsvantesson commented 4 years ago

Sure, I am usefully more careful when posting bugs.

davidsvantesson commented 4 years ago

Another benefit of this way of searching is that it is more clear for the user when selecting search options in the UI. For example if selecting "Assigned to you" it would end up as "assignee:davidsvantesson" in the search field.

guillep2k commented 4 years ago

Perhaps we should consider using a localizable resource for this (e.g. "assignee", "asignado", "asignato", etc.). Preferrably with many options. For example, in app.ini:

[search.qualifiers]
ASSIGNEE = assignee, asignado

To avoid complicating the code too much, the system could just use the first one in the list when building the search string for the search field in the returned form, like some sort of "normalization".

As for other kinds of keywords (like "fixes", "closes", etc.) this is not something you want to change whenever you update the translations from Crowding.

davidsvantesson commented 4 years ago

I am so used working in English so I didn't consider localization, but it sounds like a good idea.

I also think the current behavior with splitting on "," should be reconsidered, although it would be a breaking change. It doesn't feel very standardized and is not documented (I didn't know about it before looking into code). I think it is better to search each white-space separated word separately and allow quotation marks to search for an exact match. Then package text/scanner can be used with default settings to split the string.

guillep2k commented 4 years ago

I am so used working in English so I didn't consider localization, but it sounds like a good idea.

I also think the current behavior with splitting on "," should be reconsidered, although it would be a breaking change. It doesn't feel very standardized and is not documented (I didn't know about it before looking into code). I think it is better to search each white-space separated word separately and allow quotation marks to search for an exact match. Then package text/scanner can be used with default settings to split the string.

I agree with you about the spaces. As a breaking change, there are three options the way I see this:

  1. Change the behavior, make the users aware and get used to it (Github and search engines works with spaces, so I think this won't get too much resistance).
  2. Keep the current behavior.
  3. Add an app.ini flag to switch between old and new behaviors.

In this matter I'm all for (1), but it's just my opinion. If implementing (3) is trivial, it may be added and decide a sensible default for it by consensus.

bagasme commented 4 years ago

For 1.10, I choose option 3 (as a transition into proposed behavior), and on version 1.11, use option 1 (remove transitional flag).

davidsvantesson commented 4 years ago

@bagasme This is only a proposal issue so far.

I think the only change of behavior would be that comma should be replaced with no comma, and exact searches need quotes. The previous behavior was undocumented anyhow, I think we can add documentation for the new behavior. The special queries can be kept for backward compatibility (like &topic=1)

Preferably there shall be a "best match" sorting, like how many of the search words occurs in the repo. I don't know how hard it is to make an effective algorithm of that.

bagasme commented 4 years ago

@davidsvantesson Besides documenting (proposed) new behavior, the old one should also be documented too. This come handy when we switch to new behavior, and users complain when their old/undocumented syntax doesn't work anymore and they want explanations...

davidsvantesson commented 4 years ago

I tried to do some research on 'best match' text searches for sql. Most solutions are tied to specific sql databases (eg. oracle: REGEXP_COUNT, MsSQL: Rank). I found one solution with standard sql to count how many of the words occur for each row. It could maybe work to order by as a best match.

guillep2k commented 4 years ago

Preferably there shall be a "best match" sorting, like how many of the search words occurs in the repo. I don't know how hard it is to make an effective algorithm of that.

I think bleve can do that; it's the default text search engine for issues. I don't mind if simpler SQL indexing lacks this kind of feature.

For SQL search it's hard to decide what should count as "best match". Number of times a word appears in the title? (that should be low) Number of times it appears in the body? Number of comments, counting the body, that contain the word? Most of these will be very heavy on the database. Hence, bleve should be preferred.

A "best match" is more useful when you do some semantic analysis on the text, like counting any of "do", "did", "done" as synonyms for each other. And that's language dependent; we could make Gitea support x number of languages, but that's another whole can of worms.

davidsvantesson commented 4 years ago

Sounds reasonable. So a best match search for repositories would need a bleve only for name, description and other repository metadata. My concern is that if adding more search words only add more repositories being matched without any reasonable sorting, it might not be very useful. Usually you want to narrow down your search by adding more search terms.

davidsvantesson commented 4 years ago

I have not understood bleve fully, but it seems it can support this directly if just indexing different fields of the repo metadata: https://blevesearch.com/docs/Query-String-Query/

guillep2k commented 4 years ago

We're not using that interface. We're one level below querying directly constructing the objects manually:

https://github.com/go-gitea/gitea/blob/6551a9d6ca8ab79fe1460eb9d60a5a0e76110eb3/modules/indexer/issues/bleve.go#L222-L231

But of course that means we can do it either way.

guillep2k commented 4 years ago

Here's the code where the analyzers are decided:

https://github.com/go-gitea/gitea/blob/6551a9d6ca8ab79fe1460eb9d60a5a0e76110eb3/modules/indexer/issues/bleve.go#L129-L133

Those decide what kind of analysis you want on the strings (they must be decided at the moment the index is built!).

davidsvantesson commented 4 years ago

Maybe it is better to build up custom search query to have more control. If we shall support localization we would need to do that.

lunny commented 4 years ago

We should have our own rules but not follow bleve's because we will support many indexer backend. i.e. elasticsearch.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

hoffmannlin commented 1 year ago

search by owner:name is good ideal, officially can add this function ?

iFrozenPhoenix commented 1 year ago

@lunny any progress on this issue? It's still a problem. Currently I'm directly manipulating the url query parameters to search... Not the best thing. So while this is possible It seems that only the UI is lacking the keyword search functionality, at least in the current basic form.

lunny commented 1 year ago

@lunny any progress on this issue?

It's still a problem. Currently I'm directly manipulating the url query parameters to search...

Not the best thing.

So while this is possible It seems that only the UI is lacking the keyword search functionality, at least in the current basic form.

Nobody are working on this currently.

bendem commented 5 months ago

Would love to see this for code search as well to be able to filter a query to a subtree like path:ansible/ gitea to look for gitea only in the ansible/ folder.