Use a standardized query language to replace `filter` and `pass` props.

farmOS / field-kit

A modular, offline-first companion app to farmOS.

https://farmOS.org

GNU General Public License v3.0

62 stars 39 forks source link

Use a standardized query language to replace `filter` and `pass` props. #342

Closed jgaehring closed 3 years ago

jgaehring commented 4 years ago

I keep coming back to this idea that it might be possible to replace our current filters option in the module.config.js file that Field Modules provide with a GraphQL query. I'm really not satisfied with that filters option; it just feels a bit hacky and only seems to cover a narrow range of use cases. Here's an example of what it looks like now:

https://github.com/farmOS/farmOS-client/blob/cf6a1d00c3b2bcda74e92e3522a1d4a743c181d0/src/field-modules/my-logs/module.config.js#L7-L12

In place of this, Field Modules could simply provide a GraphQL query string to define precisely what data they need from the server, as well as from the local cache.

I could put a lot of time and thought into improving the filters options' API, but it would never be quite as robust as an established standard query language like GraphQL. Plus I don't have to worry about documenting our own API if I can just lean on the GraphQL docs instead. And it seems like a natural fit, especially if we end up adopting GraphQL on the server when we move to D8.

It'd be especially cool to work out some sort of integration with GraphQL and IndexedDB for querying the local cache. There appears to be a library which does that: https://github.com/genie-team/graphql-genie

My biggest concern would be how to handle values like the 'SELF' constant above. Not at all sure how that would work with GraphQL; perhaps it's something special we could implement with Apollo or one of the client libraries.

@paul121, I know you've worked a little with GraphQL on both the client and server side of things; curious what you think of this and whether it would work.

paul121 commented 4 years ago

hmm this is interesting @jgaehring. to clarify, you're thinking of trying to use GraphQL as an internal API?

It'd be especially cool to work out some sort of integration with GraphQL and IndexedDB for querying the local cache. There appears to be a library which does that: https://github.com/genie-team/graphql-genie

I think the larger question would be how to implement GraphQL in this "internal" way.. I agree it does seem like it's possible. Vue Apollo has a cache system that is designed to replace vuex. For that you can write additional resolvers that are only used for querying the cache, not the server. It seems like the largest benefit of adding GraphQL might be in how FK uses the cache? (Just guessing tho, the cache logic is likely the part I'm most unfamiliar with in FK!)

I could put a lot of time and thought into improving the filters options' API, but it would never be quite as robust as an established standard query language like GraphQL. Plus I don't have to worry about documenting our own API if I can just lean on the GraphQL docs instead.

Unfortunately, I'm not sure GraphQL would quite work in the ways you might be thinking ☹️ GraphQL is more suited for specifying only the fields you want returned from a query. With filters it seems like we're trying to create a simple way of defining the conditions of the logs to return. In GraphQL you can't simply specify conditions on the fields such as done: True, to only return logs that are done. Rather, you must build it into query arguments. This might be a good reference to explain this - https://stackoverflow.com/questions/52347310/graphql-conditional-queries (extra emphasis that the where: they define is not special GQL syntax - that would be an argument expected by the resolver)

It would be possible to construct a GraphQL query like this:

    {
      logs(done: True, log_owner: "SELF", hasQuantity: True, createdAfter: 123456789){
        // Fields to return
        id
        name
        done
        date
      }
    }

This is all possible, you'd just have to implement the logic in a custom GraphQL resolver for this logs query. Building out the resolvers to do this would be a fair amount of work, but I guess it would likely use the same logic you've already implemented for things like log_owner: 'SELF'. These query params would be "auto-documented" in something like the GraphQL explorer, but that's not unlike OpenAPI or another system. Does that make sense?

My biggest concern would be how to handle values like the 'SELF' constant above

A GraphQL Enum type for things like SELF might work.. not sure how complicated that might get.

If the farmOS server had a GraphQL endpoint with resolvers that matched all of the needs of filters, it would be quite cool to define the GQL string in a field module and pass that straight through to the server. BUT there are a lot of "ifs" with that. Where we're at now, building a filters object in a Field Module and passing to FK (or maybe even farmOS.js?) is quite similar.... the "resolver" being the FK logic, accepting an object instead of GQL string.

Heck, if we defined some custom REST API endpoints on the farmOS server then, in a similar way, it would be possible for Field Modules to define query params and pass them straight through to the server..

paul121 commented 4 years ago

Just doing more research into this... seems like other GraphQL toolkits like Prisma might add on some additional GraphQL filter features. This blog post from Prisma is interesting: https://www.prisma.io/blog/designing-powerful-apis-with-graphql-query-parameters-8c44a04658a9

I saw something similar with Gatsby. I hadn't seen those additions before. It's quite cool, but not a GraphQL standard. Not sure if anything like that would work out of the box with farmOS, either..

tool172 commented 4 years ago

I concur with Paul. GraphQL is a one way API for quickly returning fields of information from the server. It looks somewhat promising, I think more for remote dashboards or something of that nature.

https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Mon, Apr 20, 2020 at 5:12 PM Paul Weidner notifications@github.com wrote:

Just doing more research into this... seems like other GraphQL toolkits like Prisma might add on some additional GraphQL filter features. This blog post from Prisma is interesting: https://www.prisma.io/blog/designing-powerful-apis-with-graphql-query-parameters-8c44a04658a9

I saw something similar with Gatsby. I hadn't seen those additions before. It's quite cool, but not a GraphQL standard. Not sure if anything like that would work out of the box with farmOS, either..

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/farmOS/farmOS-client/issues/342#issuecomment-616838947, or unsubscribe https://github.com/notifications/unsubscribe-auth/APERD7HW5XTZKII3RQLEEVLRNTCDJANCNFSM4ML3655A .

jgaehring commented 4 years ago

to clarify, you're thinking of trying to use GraphQL as an internal API?

Correct. I know that's a little weird, but I think there is some precedent, specifically how Gatsby and other SSG's use GraphQL in a Node process to structure queries of the file system and other local resources.

My only significant use of GQL in the past has been with Gatsby, in fact, so that's definitely driving a lot of my line of thinking. And looking at their docs now, it does appear that Gatsby uses something similar to Prisma for filter:

Gatsby relies on Sift to enable MongoDB-like query syntax for object filtering.

This is very similar to Prisma.

So, to get to the heart of the matter, what I'm looking for is a standard language for defining a subset of data from a larger set. I think any query language fulfills that basic requirement (hell, even SQL could be used), but obviously there's a lot of unnecessary overhead that comes with a lot of those languages. Right now, with the filter object as it is, what I'm essentially doing is mimicking the REST queries that farmOS.js uses. GraphQL, as a pure query language, seemed like a good alternative because of its simplicity and declarative nature, and again, b/c if we end up using it on the farmOS API, we might as well conform to that in the Field Modules API. But if filtering isn't actually a part of the GQL spec itself, then maybe it's too simple for our purposes.

As I mentioned in #339, I'm looking to refactor some of the IndexedDB utilities in FK to use indexes and key ranges, and it could be a good time to rework the query parameter I use there as well, to reflect a change to the filter option in module.config.js, whether we us GQL or something else. One other possibility that comes with that is to model our API more closely on the IndexedDB API, but that's a pretty low level API, and frankly I think a higher level of abstraction for our API, which isn't contingent on the underlying implementation.

So yea, dunno where that leaves us, but it's something to think about.

jgaehring commented 4 years ago

I'm working right now on the Spraying Field Module and trying to figure out how best to structure a filter object for the logs it needs. Specifically, it needs to get any logs that are in the "Spray" category. This is complicated by the fact that a log only references categories by tid in its log_category field, not by the category's name. The URL params for fetching a given category has to look something like:

https://test.farmos.net/log.json?log_category=33

Crucially, the tid for any given name can't be known for each farmOS instance. The only way to get that tid for the query is to first search over the list of categories, which must be obtained in a separate REST request, or in the case of Field Kit, can be obtained from the database, since we're already storing them.

That all taken into consideration, I've made a first pass at how that might work if we allowed a function to be passed as one of the filters:

  filters: {
    log: {
      log_category(cats) {
        const { tid } = cats.find(cat => cat.name === 'Spray');
        return cats
          .filter(cat => (
            cat.name === 'Spray' || cat.parents_all.some(parent => parent.id === tid)
          ))
          .map(cat => cat.tid);
      },
    },
  },

I haven't tried implementing this at all, want to hammer out the API first, but I think it should be pretty simple, something like:

let log_category;
if (typeof filters.log.log_category === 'function') {
  log_category = filters.log.log_category(categories);
}

Since the function evaluates to an array of tid's, the return value can be added directly to the options object and passed into the farmOS.js method as .log.get({ log_category }). And the only thing the function requires is the list of categories, which is easy enough to provide.

Still, I wonder if there might be a better way of expressing this as some sort of query statement or even just plain JSON data, rather than a function. Looking at GraphQL/Prisma/Sift, I think we'd still have the problem of getting the correct tid, but perhaps that could be abstracted to some cleaner, more declarative statement, without making the consumer explicitly define a procedure for how to do it. It'd all have to be handled in the resolver, either way, and I suppose the same could be achieved with a regular JS object literal, and handled the same way.

So I guess GraphQL wouldn't provide any easier solutions for us, but again, I'm hopeful that it, or some type of query language, could provide a more declarative, generalized and, most importantly, standardized API for expressing the query, rather than us inventing our own standard, or forcing the consumer to write the procedure out by hand.

paul121 commented 4 years ago

@jgaehring This is a great example of the problem!I think this hits it on the nail:

It'd all have to be handled in the resolver, either way,

Yeah, there will have to be the logic to convert a category name into a category tid (assuming the user of the API has no way of figuring this out themselves)

Still, I wonder if there might be a better way of expressing this as some sort of query statement or even just plain JSON data, rather than a function

The function is an interesting way of doing this, but does seem a little complicated. The resolver will still need logic to figure out that it needs to supply this custom filter with catalogs, right? I'm assuming the resolver might need to supply other object types to custom filter functions, too? But perhaps only other core farmOS types such as logs, assets terms and areas so this might not be so complicated?

Because the resolver needs additional logic either way, documenting a JSON filter object seems to make sense... although allowing functions does seem to be more versatile... I'm torn 😕

I'm hopeful that it, or some type of query language, could provide a more declarative, generalized and, most importantly, standardized API for expressing the query, rather than us inventing our own standard, or forcing the consumer to write the procedure out by hand.

THIS is interesting: RSQL (Rest Query Language) From the docs:

For example, you can query your resource like this: /movies?query=name=="Kill Bill";year=gt=2003 or /movies?query=director.lastName==Nolan and year>=2000

It will still require logic to parse the RSQL query...... but do so in a standardized way?

Other relevant things:

Summary of the problem in the GraphQL context: https://github.com/graphql/graphql-js/issues/640#issuecomment-268386839

Older issue discussing adding these features to GraphQL: https://github.com/graphql/graphql-spec/issues/271

jgaehring commented 4 years ago

Thanks, @paul121, terrific feedback!

The resolver will still need logic to figure out that it needs to supply this custom filter with catalogs, right?

Correct. But I'm not so worried about that, because that won't face the Field Module developer. I'm more concerned about providing a clean, stable API that doesn't leak too many of the implementation details to the API consumer.

I'm assuming the resolver might need to supply other object types to custom filter functions, too? But perhaps only other core farmOS types such as logs, assets terms and areas so this might not be so complicated?

Correct again. I think we want to do this for any fields that contain an entityreference or taxonomy_term_reference, as defined on the log resource at /farm.json:

{
  "farm_activity": {
    "label": "Activity",
    "label_plural": "Activities",
    "fields": {
      "area": {
        "label": "Areas",
        "type": "taxonomy_term_reference",
        "required": 0,
        "default_value": []
      },
      "asset": {
        "label": "Assets",
        "type": "entityreference",
        "required": 0,
        "default_value": []
      },
      "files": {
        "label": "Files",
        "type": "file",
        "required": 0,
        "default_value": []
      },
      "geofield": {
        "label": "Geometry",
        "type": "geofield",
        "required": 0,
        "default_value": []
      },
      "images": {
        "label": "Photos",
        "type": "image",
        "required": 0,
        "default_value": []
      },
      "notes": {
        "label": "Notes",
        "type": "text_long",
        "required": 0,
        "default_value": null
      },
      "movement": {
        "label": "Movement",
        "type": "field_collection",
        "required": 0,
        "default_value": null
      },
      "log_category": {
        "label": "Log category",
        "type": "taxonomy_term_reference",
        "required": 0,
        "default_value": null
      },
      "log_owner": {
        "label": "Assigned to",
        "type": "entityreference",
        "required": 0,
        "default_value": null
      },
      "inventory": {
        "label": "Inventory adjustments",
        "type": "field_collection",
        "required": 0,
        "default_value": null
      },
      "membership": {
        "label": "Group membership",
        "type": "field_collection",
        "required": 0,
        "default_value": null
      },
      "quantity": {
        "label": "Quantity",
        "type": "field_collection",
        "required": 0,
        "default_value": null
      },
      "flags": {
        "label": "Flags",
        "type": "list_text",
        "required": 0,
        "default_value": null
      },
      "data": {
        "label": "Data",
        "type": "text_long",
        "required": 0,
        "default_value": null
      },
      "equipment": {
        "label": "Equipment used",
        "type": "entityreference",
        "required": 0
      }
    }
  }
}

For the farm_activity log type shown above, that includes these fields: area, asset, log_category, log_owner and equipment.

You and I spoke a little bit in Riot about this, but I think this would be aided by a more thorough means of defining log types, such as JSON Schema, as outlined in farmOS/farmOS#243. I may follow up in that issue with more details, but for now I think we can just hardcode references in the resolver.

THIS is interesting: RSQL (Rest Query Language)

Oh that is very cool! I think I'm a little less inclined to adopt that though, partly because it doesn't seem as readable as a GraphQL query or JSON, and also because I feel it could be misleading if it gives module developers that this would reflect the actual farmOS REST API.

You have prompted me to look at other standards though, and I recalled the Sift docs were based primarily on MongoDB's query syntax. There's a whole long list of operators, such as $gt/$lt, $and/$or, etc, but we could choose to support only a subset of those, perhaps just the comparison operators and logical operators.

MongoDB seems like a nice alternative, because it's basically plain JavaScript syntax, with the firm backing of a widely-adopted standard, ala MongoDB. Plus it might be possible to employ the Sift library in implementation, making my job a lot easier.

I'll play around with the syntax and try to come up with an equivalent filter object for the Spraying module, as well as a sketch of the resolver it will require, and report back here with my findings.

mstenta commented 4 years ago

Just want to chime in quick on this, with something that came up as a "maybe going to do" requirement for Our Sci that could help here: thinking about adding a small bit of abstraction to the farmOS server API that will remove the need to worry about "term IDs" (tid) entirely...

Internally, farmOS has a PHP function farm_term(), which will return a term if it exists, OR create a new one if it doesn't. This is used as a helper in various places in farmOS itself.

https://github.com/farmOS/farmOS/blob/ca6fe1d7f852ce25c4f8ab958054f696152f0826/modules/farm/farm_term/farm_term.module#L8

If we could leverage that via that API, we could potentially remove the need for the second lookup of tid entirely. Imagine: instead of specifying a tid in the log_category field, you just specify the category name (Spray), and farmOS does the rest for you.

Would that obviate the need for further effort in the short term?

paul121 commented 4 years ago

@mstenta yes, that would be helpful. I came across a similar issue in the CSV Importer. I drafted a reply for this thread but saved because I thought it might come up soon... See this: https://gist.github.com/paul121/05cace7f09e13e7d2185897c226ecef7

I wound up caching the taxonomy vocabulary vid from the result of /farm.json. This is the first step to getting a term tid (if you must ensure a term name is part of a specific vocabulary).

If not on the farmOS server, I'm curious if some of these things could be convenience features provided by farmOS API clients

mstenta commented 4 years ago

@mstenta yes, that would be helpful

Thumbs up - I talked with @pcambra about it and he is going to take a look at specifically enabling the ability to filter using the term name, instead of ID.

If not on the farmOS server, I'm curious if some of these things could be convenience features provided by farmOS API clients

Yea that's also a possibility, but if it's possible on the server that would be better IMO.

jgaehring commented 4 years ago

Imagine: instead of specifying a tid in the log_category field, you just specify the category name (Spray), and farmOS does the rest for you.

Would that obviate the need for further effort in the short term?

This would simplify a lot of FK code immensely! However, I don't think it will totally resolve this particular issue. The critical thing would be whether I could use URL params with the taxonomy name, instead of the tid, eg, https://test.farmos.net/log.json?log_category=Spray. Do you think that might be possible, @mstenta? But again, retrieving the tid is not a major hurdle, since FK already stores it in IDB and Vuex. It's really more about presenting a stable API for Field Modules, which will allow them to query all the data they will need, fully and succinctly, and then being able to parse that query. I guess in that sense, when it comes to querying the local db, it will be very helpful not to have to look up the tid for every taxonomy term referenced on a log. So yea, definitely helpful, but there will still be more work required to parse the statements.

On that note, however, I'm pretty optimistic about the prospect of adopting MongoDB-style queries and the Sift library. First of all, by adopting a pre-existing standard, it relieves me of the necessary decisions about how to structure every aspect of the API. Secondly, I spent a little while reading through the Sift docs and source and am very impressed; I think it will help tremendously with the implementation! What's really nice is that calling sift() just returns a predicate function, which can be passed directly to Array.prototype.filter(), but which I think can also be adapted to work with an IDB cursor. So that will be tremendously useful in parsing queries for local retrieval. Parsing queries into URL params will be a little trickier; I'll have to see what Sift can do on that front, but I'm still optimistic.

mstenta commented 4 years ago

The critical thing would be whether I could use URL params with the taxonomy name, instead of the tid, eg, https://test.farmos.net/log.json?log_category=Spray. Do you think that might be possible, @mstenta?

That's exactly what I was thinking! Sorry if that was unclear. :-)

I'm pretty optimistic about the prospect of adopting MongoDB-style queries and the Sift library

Cool! I'm curious what that means. I'm wary of over-engineering early on, especially as things may change in farmOS 2.0.x. So might be better to accept that some of the API surface for field modules will change as these things solidify. I don't think we can expect to get it right the first time. And accepting that makes things a lot easier.

I always like to "tread lightly" until we have a better sense of needs. :-)

jgaehring commented 4 years ago

That's exactly what I was thinking!

Oh great! Now I'm going to up the ante: how can I structure URL params to get any logs in the category "Spray" AND any logs in categories with a parent category of "Spray"?

I'm wary of over-engineering early on, especially as things may change in farmOS 2.0.x.

Totally! That's actually one of my main concerns, creating a stable API surface that won't be so closely coupled to farmOS 1.0, while also outsourcing as much of the API decision-making as I can (hence, the search for a pre-existing standard we can adopt). My first naive implementation of the filters object was a pretty close mapping of the farmOS REST API params to a JavaScript Object; in other words, this:

 filters: { 
   log: { 
     done: false,
     type: ['farm_activity', 'farm_observation']
   }, 
 },

becomes this:

https://test.farmos.net/log.json?done=false&type[0]=farm_activity&type[1]=farm_observation

That was pretty simple at the start, but it's already becoming much too complicated to implement, with even just a few edge cases to accomodate, like the 'SELF' parameter I needed to define for the log_owner field, and now the issue with taxonomy references. Plus I never really had a good semantics established for how to handle nullish values: does a null value for a field mean all values for that field should be returned? is undefined the same? etc... The result is a ballooning code footprint, with a lot of info about individual fields hardcoded:

https://github.com/farmOS/farmOS-client/blob/72fd3d035aa14917818644feded547fbc1241f9a/src/core/store/http/module.js#L18-L59

And that also bleeds additional implementation details into the local database queries:

https://github.com/farmOS/farmOS-client/blob/72fd3d035aa14917818644feded547fbc1241f9a/src/core/store/idb/module.js#L36-L67

So yea, don't want to over-engineer, but I'm also really wary of that implementation swelling up further and having to maintain it. I also feel like I'm going to reinvent the wheel if I start adding logical and comparative operators to the JavaScript object (eg, { $or: ['foo', 'bar'] }), which seems like the next logical step. And again, don't want to be so closely coupled to the restws module's API; the MongoDB syntax should be stable no matter what the underlying REST API looks like, or even if we end up adopting GraphQL or JSON:API instead of REST. All that taken into account, it seems like the time is right to adopt some API standard to cover a broader swath of use cases, while also simplifying the implementation. Particularly as I'm creating more modules which will need to be maintained, I want to limit how much I may need to update those modules to reflect changes to that API.

mstenta commented 4 years ago

Three new feature requests for farmOS core:

Restws: Allow filtering by term name, not only term id: https://www.drupal.org/project/farm/issues/3134066
Restws: Taxonomy terms should have a name property besides the tid: https://www.drupal.org/project/farm/issues/3134065
Restws: POSTing a new log should autocreate the term by name: https://www.drupal.org/project/farm/issues/3134067

mstenta commented 4 years ago

Now I'm going to up the ante: how can I structure URL params to get any logs in the category "Spray" AND any logs in categories with a parent category of "Spray"?

Give a mouse a cookie! ;-)

That could be possible, but won't be covered by the links above.

I'm curious what the use-case is, though. It seems risky to base filtering on the hierarchy of terms - since that could be changed by the user on the server.

mstenta commented 4 years ago

the MongoDB syntax should be stable no matter what the underlying REST API looks like

Sorry if I haven't followed all the details in this thread closely, but I'm not sure I understand how you can avoid hard-coding the same knowledge about the server's expectations by adopting a higher-level syntax. It seems like you would still need all the same translation logic to map from MongoDB syntax to the HTTP requests anyway... AND you're then setting expectations higher by saying "we use MongoDB syntax" (which someone might interpret as "oh great then it can do everything i want").

I might just need to be hand-held through the thought process. :-)

Are you thinking about moving forward with these things right now? Or still just thinking about possibilities? I guess I'm also nervous about making big decisions before we have really reviewed what the API changes will be in D8/9.

jgaehring commented 4 years ago

So I made a rough attempt at what a filter object might look like for the Spraying module using MongoDB syntax with Sift for parsing:

https://codesandbox.io/s/stoic-golick-0jk84?file=/src/index.js

In the module.config.js file this would look like:

  filters: {
    log: {
      log_category: {
          $or: [
            { name: "Spray" },
            { parents_all: { $elemMatch: { name: "Spray" } } },
          ],
      },
    },
  },

This query syntax is more or less equivalent to the filter function I posted above:

  filters: {
    log: {
      log_category(cats) {
        const { tid } = cats.find(cat => cat.name === 'Spray');
        return cats
          .filter(cat => (
            cat.name === 'Spray' || cat.parents_all.some(parent => parent.id === tid)
          ))
          .map(cat => cat.tid);
      },
    },
  },

Of course, as you'll see in the Codesandbox, I'm taking some liberties by including the category's name and parents_all fields for each taxonomy reference in log_category, but I think that's something that can be compensated for in the actual implementation.

What's interesting, I think, is that the predicate function parameter passed to .filter() in the log_category() method is essentially equivalent to the predicate function returned by sift(), which also gets passed to .filter().

Well, almost equivalent. The function parameter in the log_category() method takes a category as its argument, whereas the function returned by sift() takes a log as argument. But I think what's nice is that we can apply sift() pretty much at any arbitrary level of nesting we want, in order to derive the values we need. In other words, to derive the options object we need to supply to farmOS.js, we can do this:

farm.log.get({
  log_category: rootState.farm.categories.filter(sift({
    $or: [
      { name: 'Spray' },
      { parents_all: { $elemMatch: { name: 'Spray' } } },
    ],
  })).map(cat => cat.tid),
});

Which roughly evaluates to something like this:

farm.log.get({
  log_category: [33, 35],
});

From which, in turn, farmOS.js generates URL params more or less like this:

https://test.farmos.net/log.json?log_category[0]=33&log_category[1]=35

(NB: It appears these URL params don't work, seemingly because the restws module doesn't support multiple tid's for the log_category field, the way it accepts multiple type's. So unless there's a quick-fix that can be done on the server, we may have to rethink whether we can support multiple categories at all, but this is separate from the question of how to best represent the query statement.)

Which is all to say, I think we can get a lot of use from the Sift library!

jgaehring commented 4 years ago

I'm curious what the use-case is, though. It seems risky to base filtering on the hierarchy of terms - since that could be changed by the user on the server.

The user already has several spraying-related categories, such as "Pre-Emerge Spray", "Post-Emerge Spray" and "Y-Drop". It's not ideal, but the apart from adding a dedicated log type, we need a way to identify all of these under one category. My hope is to reduce the chance of error by nesting the different types of spray under one, general "Spray" category (worth noting, that "Y-Drop" is already a child of the "Fertilizer" category, but we can allow for multiple parents, hence the use of parents_all in the implementation). This way she doesn't have to use the "Spray" category if she prefers to use the more specific categories.

Again, not ideal, but seems like the most flexible option given our constraints.

jgaehring commented 4 years ago

I'm not sure I understand how you can avoid hard-coding the same knowledge about the server's expectations by adopting a higher-level syntax.

I'm trying to avoid that hard-coded knowledge getting into the API. It's acceptable to me if that info is hardcoded into the implementation. The most naive API we could present would just be to require that each Field Module supply its own URL parameters, as a string, but obviously that has the danger of breaking all modules if there's any change to the REST API. Our current filters API is only a thin layer of abstraction on top of that, basically just a JS representation of URL params. The goal of adopting a more robust query syntax is that instead of describing the URL params, we can get closer to describing the data itself. Now, if you anticipate having major breaking changes to the underlying data structures, that's a bigger problem, but hopefully that's not the case?

You must also remember that filters is not only intended to describe the data for the sake of generating URL params, but also describes the data so it can be retrieved from the local database. So again, it's preferable to have some intermediary syntax for describing the data which can be translated to both URL params and an IDB query. This is actually getting pretty complicated to implement with the ad hoc API we have currently. See the code examples from idb/module.js and http/module.js in https://github.com/farmOS/farmOS-client/issues/342#issuecomment-624096880 if you want to get a better feel for what I mean.

I could keep extending this ad hoc API to suit our needs as they come, but for each iteration that requires a lot of careful considerations that are not obvious at first (eg, how to handle undefined fields; again, see https://github.com/farmOS/farmOS-client/issues/342#issuecomment-624096880 for more explanation). It also means constant revision of the implementation, taking all those considerations into account, and an increased likelihood of changes to the API as well. And in the end, I think we're just reinventing the wheel. Because there are established API's for describing data sets and their subsets, so we don't have to agonize over those decisions ourselves. And there are very good libraries for implementing those API's, so we don't have to write our own parsing logic (and rewrite it each time we need to extend it). The iterative process doesn't make sense to me when there are pre-established standards for achieving more or less what we want. We may have to adapt it to a few of our edge cases, but that still seems preferable to starting from scratch.

AND you're then setting expectations higher by saying "we use MongoDB syntax" (which someone might interpret as "oh great then it can do everything i want").

I don't see how that expectation is being set automatically. For one thing, we don't need to state in our API that we support the full MongoDB syntax. We can claim to support a subset of the syntax if we want (like the logical and comparative operators only, as I suggested above), or we can just document the supported operators piecemeal, in our own documentation. And even if we did just adopt their query syntax wholesale, it seems like a stretch to interpret that as "oh great then it can do everything I want". There are still conditions on what data is available and how it can be manipulated; we're just providing a way of describing it. And if a developer did take that leap, that any arbitrary MongoDB operation was made possible by that syntax, they would quickly find out it didn't work. The query would fail, and they'd have to figure something else out before moving on with development. I'd much prefer prompt failures in development to possible breakage in production because what they thought worked was later broken by an upstream change, which seems more likely with an ad hoc API that's more closely coupled to the REST API.

Are you thinking about moving forward with these things right now? Or still just thinking about possibilities?

Yea, so, all my strongly stated opinions aside, I am still in the exploratory phase. I am very optimistic about what can be achieved with minimal effort by employing the Sift library, but I also recognize that, to its credit, Sift is ultimately just a way of converting MongoDB queries into predicate functions (see https://github.com/farmOS/farmOS-client/issues/342#issuecomment-624628826), and it is those functions which are ultimately required in the implementation, not the queries themselves. So I'm considering the possibility of just supporting functions in the filters object for now (see https://github.com/farmOS/farmOS-client/issues/342#issuecomment-623172932), if that suffices to achieve my immediate goals for the Spraying module. Then, if down the line there's a clearer need to adopt MongoDB syntax, it should be easy enough to drop in Sift, while maintaining support for the function syntax. It should just require something like this in the implementation:

let query;
if (typeof log.filters.log_category === 'function') {
  query = log.filters.log_category;
} else {
  query = sift(log.filters.log_category);
}

That seems like a reasonable first step.

What I do want to avoid right now is extending the API to support our own ad hoc query syntax, which might have to be changed later on. For instance, if we were to try adding logical operators to the filters syntax, something like { $or: ['foo', 'bar'] }, but weren't totally compliant with the MongoDB spec, that could introduce some inconsistency of syntax or logic that we hadn't fully thought through, which could later bite us in the butt and require a breaking change to the API.

jgaehring commented 4 years ago

If we could leverage that via that API, we could potentially remove the need for the second lookup of tid entirely. Imagine: instead of specifying a tid in the log_category field, you just specify the category name (Spray), and farmOS does the rest for you.

@mstenta, does this mean the taxonomy's name property would be included in the taxonomy reference under the log_category field in each log? And would this apply for assets, areas, etc, as well?

mstenta commented 4 years ago

@mstenta, does this mean the taxonomy's name property would be included in the taxonomy reference under the log_category field in each log? And would this apply for assets, areas, etc, as well?

Yes: https://www.drupal.org/project/farm/issues/3134065

It would apply to any taxonomy term reference field. It would not apply to asset, area, or other entity references. Maybe we could do the same with those, but that's outside the scope of the feature request linked.

jgaehring commented 4 years ago

It would not apply to asset, area, or other entity references. Maybe we could do the same with those, but that's outside the scope of the feature request linked.

Ahh, ok, good to know. This would be helpful, because there are lots of places where I'm looping inside of loops to check whether the id in an entity reference matches the tid of any item in the corresponding list of entities. If I didn't have to do that it would certainly help with performance and would eliminate a lot of boilerplate, but it's certainly not a blocking issue. I can open an issue if you like, but not critical.

mstenta commented 4 years ago

I'll be honest I'm a bit worried about how much complexity this all will add, and potential issues it could create in the future. But it's just a gut feeling - and I'm not fully grokking what's involved, and what the implications are.

In general, I agree that it's a good thing to make the "API" of field modules abstracted, to avoid changes to the server breaking field modules. But in practice I have a feeling that we may not be able to avoid some of those issues. And that is specifically why farmOS added the api_version parameter it /farm.json - with the thought that any code that uses the API should check that first, and pin itself to specific version(s) that it's code and assumptions support. And it the farmOS server version is higher than that, "gracefully refuse".

The very fact that Field Modules will be loaded dynamically into an unknown version of farmOS and an unknown version of Field Kit makes it even more necessary for them to declare their supported API version, I think.

What happens if they declare their filters using MongoDB syntax, but the filters they describe are not supported on the particular farmOS/Field Kit versions they are being run on? The same "breakage" can occur. So it makes me wonder if the extra complexity is worth it, or if we should just accept that field modules will need to declare their supported API versions regardless. That would keep things a whole lot simpler right now, even if it means needing to make changes in the future. Those changes are probably unavoidable.

And then we're right back where we stared: if they are already pinning to a specific API version, then it's not really an issue if they make specific assumptions about the API.

I guess the main feeling I have is: we don't know what we need yet. And maybe that is what 1.x vs 2.x API versions are useful for. It might be better to avoid trying to predict too much in 1.x, accept that some things will be hard-coded, and then add more capabilities via progressive-enhancement when we do have a better understanding.

All that said, I want to emphasize that this is just my "gut" feeling about these things, and I admit that I haven't taken the time to understand everything in great detail. So take it for what it's worth... which might not be much. Just thoughts I felt the need to put out there. :-)

If you are well into it already, don't let this slow you down!

mstenta commented 4 years ago

Updates...

It's now possible to filter by term name instead of ID! https://www.drupal.org/project/farm/issues/3134066#comment-13607701

And term names are included in the results! https://www.drupal.org/project/farm/issues/3134065#comment-13607744

Screenshot from 2020-05-11 08-48-31

Thanks @pcambra!

jgaehring commented 4 years ago

It's now possible to filter by term name instead of ID! And term names are included in the results!

Awesome! Thanks @pcambra & @mstenta!

So how are we incorporating this into the API versioning scheme? Will it require a bump? Can I expect this feature to be available in all Farmier instances?

It's not urgent. I think I'm going to leave my new implementation of filters as is, b/c it also requires the parents_all field, so I'm iterating over the full term data, in which case it's also not much difference whether I query by the tid or the name field. However, as I mentioned, this will be huge for other parts of FK, particularly in rendering the UI (eg, if I want to display the name of each area associated with a log, I don't have to iterate over every area for each term reference, just to find its name).

mstenta commented 4 years ago

So how are we incorporating this into the API versioning scheme? Will it require a bump?

Good question. Yes, I think we should bump the API version to 1.3. I've already started some updates to the API docs on farmOS.org - but these won't be published until the next tagged release of farmOS (which I am getting closer to!)

Can I expect this feature to be available in all Farmier instances?

Yes, it is already deployed to Farmier instances.

jgaehring commented 4 years ago

I think we've made good progress on some of the issues raised here, with the filter and pass props we're using with the loadLogs and syncLogs actions. I still like this idea of using a query language internally, but I think it doesn't make sense to implement with the API restrictions we currently face in farmOS 1.x, vis a vis the restws module. So I'm removing this from the 0.7.0 milestone and adding the farmOS 2.x tag so we can circle back to this in the future.

jgaehring commented 3 years ago

I'm bumping this up to the 2.0.0-beta1 milestone. I've got a first pass on using MongoDB-style queries in farmOS/farmOS.js@527d4f4 (see also https://github.com/farmOS/farmOS-client/issues/430#issuecomment-742059359). It should be a simple matter at this point to enable the loadLogs and syncLogs actions to use those queries for the filter property and just pass it along to farmOS.js. The biggest task will simply be documenting the changes to the API.

jgaehring commented 3 years ago

I believe the last piece of this should be resolved by jgaehring/farmOS-client@ff1784a, which will handle filters for local caching, since farmOS.js has nothing to do with that. Just need to test it thoroughly to be sure.

jgaehring commented 3 years ago

Resolved by 59c3f30fb90dca1157d6dad46178141f6bba27b2.