Search - Githubissues

beetbox / aura

music library REST API

http://auraspec.rtfd.org/

77 stars 9 forks source link

Search #18

Open gryphonmyers opened 4 years ago

gryphonmyers commented 4 years ago

On the topic of Complex Queries, as described in the wiki, I think this is fairly important feature to include in any library api, as people tend to need to search for their data. With no form of partial matching available, it may be difficult to provide a good user experience as a client would be limited to predefined options, or rigid search behavior that is only able to render exact matches. I do also agree with the decision to keep the query syntax simple and rooted in basic HTTP syntax though.

How would you feel about supporting basic globbing syntax in query values (or even just the Kleene Star, as that would be the most useful) in field values?

Pros:

Easily understood
Easy to implement

Cons:

Rather limited functionality
Character encoding complexity
- Support for globbing characters would need to be accompanied by a percent encoding recommendation / requirement.
- How to distinguish between wildcards and literal characters?

sampsyo commented 4 years ago

Yes, this is a great point. It does seem like we need some way to do basic search—it will be hard to build useful interfaces on top of the API otherwise.

To think about some other options here:

Something like the beets query syntax. It's flexible and extensible—you can add new "kinds" of queries to the same syntax easily, and it naturally extends to full Boolean logic. However, it was designed to be convenient for humans to write and is probably not a good choice for an API.
Just substring queries. That is, you specify a list of fields and strings that the fields must contain. Basically the same as globbing where every search looks like *this*. The advantage would be that there's no "in-band signaling": clients would not, for example, need to put the wildcards in themselves or figure out how to escape wildcards in the string that a user typed. The disadvantage, of course, is that it's less flexible; for example, you can't match at the beginning or end of a string.
Something a bit more extensible. For example, we could provide arbitrary query "types" that could be extended in the future—and only specify basic case-insensitive substring queries for now. Then, if globs seem useful to do on top of this, then we could standardize new types.

I'm starting to think that starting simple (just substrings) and building in a path for extensibility in the future might be the wise way to go. Does that make sense?

gryphonmyers commented 4 years ago

Yes, this is a great point. It does seem like we need some way to do basic search—it will be hard to build useful interfaces on top of the API otherwise.

To think about some other options here:

Something like the beets query syntax. It's flexible and extensible—you can add new "kinds" of queries to the same syntax easily, and it naturally extends to full Boolean logic. However, it was designed to be convenient for humans to write and is probably not a good choice for an API.

Just substring queries. That is, you specify a list of fields and strings that the fields must contain. Basically the same as globbing where every search looks like *this*. The advantage would be that there's no "in-band signaling": clients would not, for example, need to put the wildcards in themselves or figure out how to escape wildcards in the string that a user typed. The disadvantage, of course, is that it's less flexible; for example, you can't match at the beginning or end of a string.

Something a bit more extensible. For example, we could provide arbitrary query "types" that could be extended in the future—and only specify basic case-insensitive substring queries for now. Then, if globs seem useful to do on top of this, then we could standardize new types.

I'm starting to think that starting simple (just substrings) and building in a path for extensibility in the future might be the wise way to go. Does that make sense?

This makes a lot of sense. Do you have specific ideas about how to go about supporting new query types? I suppose it could just be a matter of using different param keys, e.g. ?filter[artist]=Blue performs an exact match, ?search[artist]=Blue does a substring search (*Blue*), then perhaps something like ?beetsquery[artist]=Blue exposes the full beets query syntax. To me, this sounds like a clean solution as the scope of each param is clearly defined and separated, and allowed to establish its own input restrictions. Also less potential for breaking changes down the road, because the feature set for each param should remain fixed even if new query behaviors are added to the spec.

sampsyo commented 4 years ago

Sure; that seems cool! We could even consider keeping the filter namespace constant and just adding "qualifiers" to the field names to get different behavior, i.e., ?filter[artist]=Blue for exact matches and ?filter[search:artist]=Blue or similar for substring queries. This would perhaps simplify the client and server logic a bit—if you want to know all the criteria to use for filtering, just gather up all the filter keys (as distinct, for example, from the sort keys).

govynnus commented 4 years ago

I've been thinking a bit about AURA client UIs recently and realised that most clients will probably want a single main search box. This makes me think that we could have something like ?search=foo that looks at all fields and matches substrings, like @gryphonmyers' suggestion but without specifying a field. To get results for all of tracks, albums and artists would still involve the client making 3 separate requests, but that probably isn't too bad.

I quite like the idea of filtering and searching being a bit separate, rather than search being a qualifier of filter. You could have a client that works just on filtering a whole collection (tracks, albums or artists) until the user gets what they want, or a client that gets a list of search results and then allows the user to filter (kind of like shopping websites).

I'm also wondering about the possibility of allowing optional regular expressions for filters/searches for people who want more control over queries, or to allow clients to decide where wildcards should go. However the latter would raise the problem of escaping user input as @sampsyo mentioned earlier. Also I feel like regex might be quite complicated in terms of having a standard syntax.

gryphonmyers commented 4 years ago

Regex would certainly offer the most flexibility to developers of client applications, but the last time I was thinking through this it seemed inelegant to handle regex in a URL param. Also there's the issue of which regex implementation are you supporting, specifically? If we aren't careful it could lead to server implementations that vary in their handling of this feature.

Regarding a dedicated search param, the distinction would be that it matches on all fields at once, and does a substring match on all of them? Biggest problem I see there is that could be a needlessly expensive operation - with a more refined query API the client could optimize that request by only performing substring match on the fields they care about.

I have some ideas since my previous posts I will share soon

On Wed, Sep 2, 2020, 6:49 AM out-of-range notifications@github.com wrote:

I've been thinking a bit about AURA client UIs recently and realised that most clients will probably want a single main search box. This makes me think that we could have something like ?search=foo that looks at all fields and matches substrings, like @gryphonmyers https://github.com/gryphonmyers' suggestion but without specifying a field. To get results for all of tracks, albums and artists would still involve the client making 3 separate requests, but that probably isn't too bad.

I quite like the idea of filtering and searching being a bit separate, rather than search being a qualifier of filter. You could have a client that works just on filtering a whole collection (tracks, albums or artists) until the user gets what they want, or a client that gets a list of search results and then allows the user to filter (kind of like shopping websites).

I'm also wondering about the possibility of allowing optional regular expressions for filters/searches for people who want more control over queries, or to allow clients to decide where wildcards should go. However the latter would raise the problem of escaping user input as @sampsyo https://github.com/sampsyo mentioned earlier. Also I feel like regex might be quite complicated in terms of having a standard syntax.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beetbox/aura/issues/18#issuecomment-685749782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMENWCYNM3AS3QYL6OANZ3SDZENZANCNFSM4NFUHF6Q .

govynnus commented 4 years ago

I think in a URL parameter you could pass it through encodeURIComponent(), but I haven't actually tried it. Like you say the big issue would be consistent implementation, which makes me think it's not such a good idea.

I agree at least basic substring and case-insensitive matching is needed on filters. If the client only cared about certain fields then they could do something like ?filter[substring:title]=sub&filter[substring:artist]=sub&filter[....., but doing that for all fields seems a bit unwieldy. It is a good point that some fields don't need to be searched (like integer fields, musicbrainz ids and mimetypes), which would leave 7 'searchable' fields for tracks, 3 for albums and 1 for artists.

Of course matching 7 fields rather than, say, 3 is going to be more expensive but it's very easy for the server to extract the required information from the URL. For filters the server needs to look through each parameter, see if it matches the filter[...] pattern, and possibly figure out if it's substring, case-insensitive, or something else. Also probably a lot of back-ends will have some kind of in-built ability to match multiple fields at once, but I don't know how much of a difference that makes.

Looking forward to your ideas.

sampsyo commented 4 years ago

One option might be to have two separate options: a standard query interface using filter[title]=..., etc., that essentially encode SQL "WHERE" clauses, and a separate search that is much fuzzier—it could match all fields and use case-insensitive substrings, but it could also attempt to do an implementation-defined "smart" search that guesses what the user was really after. The former would have a clearly defined meaning; the semantics of the latter would be undefined and left up to the server to allow variability in how fuzzy searching works. Would that make sense?