MushroomObserver / mushroom-observer

A website for sharing observations of mushrooms.
https://mushroomobserver.org
MIT License
81 stars 26 forks source link

Simplify `Query` flavors #2425

Open nimmolo opened 2 months ago

nimmolo commented 2 months ago

From a discussion with @pellaea. Simplifying Query is a longtime goal, that will also make search queries easier to maintain and more readable.

  1. Change internal uses of flavors, converting each usage one at a time: from, e.g., Query::ObservationByUser to Query::ObservationAll
    
    # Turn this:
    lookup_query(:Observation, :by_user, user)

Into this:

lookup_query(:Observation, :all, user: user)

That can happen piecemeal, one query at a time, without breaking anything.  There will be a few cases that require additional work:
```ruby
advanced_search
needs_naming?
pattern_search

I think we might be able to get away with simply moving the additional parameters into base. For example, pattern_search has the parameter "pattern". It just does a subtle variation on "names" and "locations". (Weird, I could swear pattern search used to also search the observation notes and comments!) Anyway, the point is that in these cases where base doesn't have an exact equivalent, figure out how to implement something sufficiently similar with existing base parameters and/or add new base parameters to handle them.

  1. Fill out the PatternSearch translator tables so that every base query parameter can be translated to and from search pattern parameter(s). I think it makes most sense to maintain one-for-one correspondence. There can be overlap. For example, there is a "notes_has" query param, but the search pattern may have a single all-purpose "pattern" parameter that searches notes, comments, names, etc. That's fine. Just fill both out: add the all-purpose "pattern" parameter to Query::ObservationBase, and add the more specific "notes_has" to PatternSearch.

  2. Provide a reverse translator to PatternSearch to convert Query instance parameters into a PatternSearch hash and/or string of parameters. Once this is done, we should be able to convert any Query instance into a PatternSearch and vice versa.

  3. Do whatever needs to be done to go back and forth between PatternSearch and your search form. Sorry, I just don't have the capacity right now to dive into new code to see what that would involve. But I would hope it would just involve some simple generic "escapers" and "unescapers". After all, the values accepted by PatternSearch are already human-readable.

Notes

re: how to cover the bare search word in existing pattern search -- Yes, I would make a new Query::ObservationBase parameter to cover that so that it is explicit and handled just like all the other search terms.

I do think it might be worth having whatever parses the search bar field allow a plain old bare search word. In the search form, each field will be explicit, ~no "bare words" allowed~ EDIT: they are allowed by the form and parser now. I think it is too much to expect people to enter «pattern:"deadly mushroom"» into the search bar instead of just «deadly mushroom». Just have the parser recognize bare words (without the colon), and shove them in the "pattern" parameter.

Now what that parameter does is up to you. Ask yourself: What would an avg. non-power user expect it to do if they entered "Amanita" or "red gills" or "under douglas fir" or "fishy" or whatever into the search bar?

re: complex flavors -- It might make sense to work from the outside in. That is, instead of trying to figure out how to save "with descriptions by editor" flavor, instead fix the callers. It is entirely probable that you'll find that no one even uses some or most of the complex flavors, and they can simply be deleted.

In the end, you can even get rid of all of the subclasses, including ObservationAll, and rename ObservationBase to just plain old Observation. And then rip out all of the flavor stuff. A kind of unnecessary but highly aesthetic simplification, maybe enjoyable.

nimmolo commented 3 weeks ago

Follow-up question to do with "query coercion":

I am now trying a PR to handle the interlocking “obs_in_set”, “location_with_obs_in_set”, “image_with_obs_in_set” flavors.

If “obs_in_set” becomes “obs_all” with an “ids:" param, the flavors like “loc_with_obs_in_set” then need to send their coercions to “obs_all”… but also, “obs_all" needs to be able to send ITS coercions back to “loc_with_obs”. That means that “loc_with_obs” needs to be able to handle an “observation_ids” param, which is currently only handled by the “child” flavor “loc_with_obs_in_set”. Okay, so i could move the “ids” param declaration up a level, to “loc_with_obs" and “loc_with_obs_in_set” will inherit that param.

But notice the asymmetry between "obs:all" and “loc_with_obs" at this point - eventually i will want to move “loc_with_obs” into "loc:all”, and the param name for the set of obs can’t be “ids:” because in a Location query, “ids:” is for a set of Locations.

So I’m wondering if i’m going to maybe need to move all these params to the all flavors at once. I was trying it this way (just trying to change “obs:all” and its reciprocal coercions “loc_with_obs” etc), because it intuitively feels nuts to try to put all these params into the top level “all” query of their respective query classes all at once, eg “Location:all”, “Image:all”, etc. But now i’m wondering if that’s the only way.

Related question: if i move “loc_with_obs” into “Location:all” it seems like i need two params to cover what it and its child flavors do. The param “:Location, with_observations:” would be a boolean, and then “:Location, with_observations: true, observation_ids: Array" would be the array that covers the set. (A query could of course send “with_observations:” without sending “observation_ids:”.)

Response from Jason (so i don't have to keep looking for it in my emails):

I still think that we should do away with coercion, [or else] I think you're in for a world of hurt. Imagine, for example, starting with an obs query with various params, coerce to loc, all or most of the original params need to be mapped to "obs_xxx" params to work with loc query. Now add more loc params and coerce back to obs, some or all of the new loc params need to be mapped to work with obs queries. Now coerce to name... Maybe you see where I'm going? Every parameter in every model needs to be available in every other model that you can coerce to. If nothing else, you will end up needing to join from names to locations through observations and take a huge performance hit. But if you do away with coercion, we can limit joins dramatically and ensure safe performance capabilities.

What if instead we deal with coercion differently. Leave the query alone. Keep it as an obs query, only allow modification of obs params on obs queries. Loc and name don't need to know anything about obs parameters or vice versa. It's when executing the query that the "coercion" happens. Just have it select location_id instead of id, and change query#model from Observation to Location (but leave query#table alone, it's still "observations" because it is, after all an obs query not a loc query).

Going from obs to loc or name is easy. Just change "select id" to "select location_id" or "select name_id", and change the model. Going from obs to img or from loc/name to obs is slightly different. But I think we can get away with doing "select id from observations where location_id in (...original-loc-query-as-subquery...)" or "select image_id from observation_images where observation_id in (...subquery...)".

Note also, that I am going to strongly recommend losing the inner-outer query functionality!! That feature was more complicated and harder to debug than all of the rest of Query! And no one uses it or needs it (except possibly Darvin).

Of course, it's your call ultimately, but I really can't answer your questions. I don't see any easy way to solve your coercion problems. Every time I think about it, all sorts of red flags go up that indicate that we're doing this all wrong. Well, strictly speaking: I did this all wrong from the beginning. And I knew it, but it was an afterthought. Query ended up doing so much more than I originally designed and intended for it to do.

Oh, right, and the title crap gets radically simplified, too, if you do away with coercion and inner/outer queries, I think. Title is so unimportant. Please, please simplify at the expense of making the titles less informative. I have ideas on that subject I can share if you're interested later.