Open synrg opened 1 year ago
I think "validation" is the wrong word. Really, this is just stages of evaluation in a sort of assembly-line fashion. The Query itself is mostly just an expression of a set of filters to apply to the result set and not fully realizable until used in a command which provides the "verb", as well as determining which presentation is appropriate for the results e.g. "search" (implictly observations) to produce a result set containing all matching observations, in a paginated display presentation.
While fully developed classes for these concepts aren't yet firm, this is a rough outline of the parts of a command as they stand today:
That might not be such a bad name after all, QueryResults, until it is made concrete via using it in a Command, so:
query = Query.parse('fungi by me from ns in prj lichens atlantic on today")
taxon_command = Command("taxon")
counts_command = Command("taxon counts")
query_results = query.prepare() # => QueryResults
taxon = taxon_command.realize(query_results) # => Taxon
taxon_counts = counts_command.realize(query_results) # => TaxonCounts
menu = TaxonMenuWithCounts(taxon_results=taxon_result, taxon_counts_list=[taxon_counts])
menu.start()
This example captures the fact that the resulting display has a primary result output (the taxon) followed by the table below it. Therefore, it is actually two commands in one, each taking the prepared results as input.
Alternatively, I might prefer if we called it a PreparedQuery which shifts emphasis off of the "results" aspect of it (i.e. these aren't yet the final result, but an intermediate stage to fetching them), and thus:
query = Query.parse('fungi by me from ns in prj lichens atlantic on today")
taxon_command = Command("taxon")
counts_command = Command("taxon counts")
prepared_query = query.prepare() # => PreparedQuery
taxon = taxon_command.realize(prepared_query) # => Taxon
user_or_place_taxon_counts = counts_command.realize(prepared_query) # => Union[UserTaxonCounts, PlaceTaxonCounts]
menu = TaxonMenuWithCounts(taxon=taxon, counts=[user_or_place_taxon_counts])
menu.start()
Oops, misclick. Still not happy with this. Will return to it later. "realizing" a command seems wrong. perhaps the two inputs to the menu are just results of functions applied to the prepared_query instead.
First, I think I shouldn't buck convention and should keep Query at the front, so QueryMappedEntities is the best I have so far as a replacement for QueryResponse. The prefix "Query" helps it collate, making it easier to find and strengthening its relatedness with the original Query, "MappedEntities" focuses on the remapping of bits of text in the query to at least partial Models and more precise qualifiers like expanding "today" to the date today, expanding macros, etc.
This refinement of my earlier ideas above relies far fewer new Dronefly objects and instead directly makes use of existing pyinat TaxonCounts and TaxonCount models. What follows also serves as a bit of an overview of how Command, Context, Query, QueryMappedEntities, Source, and Menu should fit together to form the overall structure of most Dronefly commands. Much of this is already written (at least partially), but perhaps not everything here is fully articulated elsewhere.
OK. I'm back to treating taxon
command as a single command. We don't need a taxon counts
separate command, just some preparatory steps to add the counts to the data source passed to the front-end. The taxon
command body proceeds, bucket-brigade style, through three stages. It starts with parsing the query & preparing it*, then packages up the specific arguments for the front-end, then starts the front-end (the menu) so the user can see the results and interact with them. By the time it gets to the menu start, everything necessary to produce the display should either already be looked up, or else is wrapped in a generator that will fetch them as needed a page at a time, often with a pyinat Paginator at the bottom layer. However, in this example, the list starts empty or with just one entity, and can grow or shrink as users their own stats via menu button-presses.
* Certain real-world aspects are left out of the following to keep it simple for illustrative purposes, e.g. details for the creation of the Context
and Command
objects, and parsing and preparation of the query would be handled within a context manager that provides ctx
to the command block.
Here's what this simplified restructuring of our current taxon
command might look like:
from dronefly.core.commands import Command, Context
from dronefly.core.query import Query
from dronefly.core.menus import TaxonCountsSource
from dronefly.discord.menus import TaxonWithCountsMenu
ctx = Context()
ctx.command = Command("taxon")
try:
# parsing and preparation:
query = Query.parse('fungi by me from ns in prj lichens atlantic on today")
ctx.query_entities = ctx.command.prepare(query) # => QueryMappedEntities
# get menu arguments:
taxon = ctx.query_entities.taxon() # => single Taxon that best matches the query
taxon_count_type = ctx.command.preferred_count_type(ctx.query_entities) # => Union[Type[Place], Type[User], None]
if taxon_count_type is None:
counts = None
else:
counts = get_taxon_counts(ctx.query_entities, taxon_count_type) # => collection of TaxonCounts for counted type
counts_source = TaxonCountsSource(ctx, taxon=taxon, taxon_counts=taxon_counts, taxon_count_type=taxon_count_type)
# start the menu:
menu = TaxonWithCountsMenu(ctx.cog, taxon=taxon, counts_source=counts_source)
menu.start(ctx)
except:
# error handling for malformed query, no matching taxon, no matching place, user, etc.
A bit of logic that wasn't designed to my satisfaction in the current ,taxon
command is split up here into command.preferred_count_type()
to tell us whether we're counting users or places and get_taxon_counts()
, a general-purpose helper method that will count either entity, taking just the query_entities
and the taxon_count_type
as arguments. For instance, as per our current ,taxon
command behaviour, if the query_entities
contains a place
or user
, then taxon_count_type = Place
or = User
respectively, and that determines whether counts are fetched initially from the API and if so, which kind. If both are specified, then which one is prioritized depends on the command: for ,taxon
it will be User
, but other commands may differ, justifying making preferred_count_type()
a method (or attribute) of the command, not the query_entities.
Now we have the few different data items that the menu will operate on: taxon
, taxon_counts
, and taxon_count_type
. These are bundled together in a TaxonCountsSource
that is passed to the menu upon creation. That allows the menu to be written fairly generically, improving reuse in different commands. It shouldn't even need to consult the query_entities
now, since the source
already has everything it wanted from it. The source
presents to the menu an interface for retrieving the main content of the taxon display (name, conservation status, etc,), paging through a list of user or place taxon counts associated with it, providing methods to update that list that can be attached to menu buttons, etc. The menu itself doesn't need to know how the source
provides all of this info, or know even if any additional filters came into play (date/time, place, etc.) It just asks the source for everything it needs to pour into the view, and apart from that can be written to be fairly general, thus cutting down on the amount of custom code written per command.
With this arrangement, front-end UI elements like buttons can even be provided that are attached to handlers in the source that update the query, as in our ,life
command where a new root taxon is chosen, or an entirely different tree generated based on different rank filters, etc. All of this is working in the current codebase for ,life
but other commands like ,taxon
haven't yet received this treatment. This issue has been one of the blockers.
After studying pyinaturalist's models, I'd like Query to be based on pyinaturalist.base.models, as they provide robust abstractions that improve on what we've made so far. While pyinat does not have a user's query as a concept distinct from the API requests that would be needed to fulfill them, it does have core classes that we can use to make one.
A Query is not, itself, a description of a single API request. It is a text description of a number of parameters sent to iNaturalist to produce a single display with two parts, the description of the base entity of the request, and zero or more individual results relating to that entity (e.g. total # of observations for the query, and per-observer counts of observations and species for the query).
For example, a
fungi by me from ns in prj lichens atlantic on today
request could be realized as pyinat RequestParams with individual params as follows (slightly simplified for illustration purposes):taxon = Taxon(id=47170)
taxon
here is the first record returned from/v1/taxa/autocomplete?q=fungi
user = User(id=545640)
me
refers to user's own idplace = Place(id=6853)
ns
is looked up in a table associated with the user's command contextproject = Project(id=62291)
lichens atlantic
is the first matching record from/v1/projects/autocomplete?q=lichens+atlantic
observed_on = datetime.today()
today
is parsed viadateparser.parse()
For each of the entities retrieved from local tables, only a partial object is needed, just so that
place.id
, etc. will work.In the
dronefly
codebase, this "fully parsed" query is called aQueryResponse
(which I'm not entirely happy with). It is still, however, only a template for one or more primary requests for the page to fill it with content.Which requests are performed depend on what command handles the query. For example, in this simplified rendering of a
taxon
display with the above query arguments:Several distinct API requests based on the query would be needed to fill in all the parts above including at least:
/v1/observations?taxon_id=47170&user_id=545740&place_id=6853&project_id=62291&verifiable=any&observed_on=2023-01-02&per_page=0
total_results
to put in52 observations
/v1/observations?taxon_id=47170&user_id=545740&place_id=6853&project_id=62291&verifiable=any&observed_on=2023-01-02&user_id=545640&per_page=0
AND/v1/observations/species_counts?taxon_id=47170&user_id=545740&place_id=6853&project_id=62291&verifiable=any&observed_on=2023-01-02&user_id=545640&per_page=0
benarmstrong
is already cached, otherwise a/v1/users/545740
might be needed to obtain this fromuser.login
Finally, it should be possible to map between a command with query argument to a URL to the web page that best represents that base request, and any other parts of the page (usually counts of each entity which link to searches for those entities on the web):
52 observations
link)3 (2) benarmstrong
link)With all this in mind, the Query class should represent all of these arguments in a way that more closely resembles existing pyinaturalist models.
Here is a representation of this progression from text to parsed query to a validated query that is finally ready to be used in a command as dict-like results from each step:
I'm still not sure of QueryResponse vs. some better name. Maybe ValidatedQuery?