covjson / specification

CoverageJSON specification
https://covjson.org/spec/
45 stars 6 forks source link

Force parameters to be at root level for common collection profiles? #55

Closed letmaik closed 2 years ago

letmaik commented 8 years ago

I think it would be helpful to require that for the common collection profiles all parameters have to be defined at collection level and not inside individual coverages. This makes processing easier. Is this requirement too strict in general? Maybe that's a case for multiple profiles (#46) like ["PointCoverageCollection","CollectionParameters"].

jonblower commented 8 years ago

My gut feeling is that it would be too strict to insist that all parameters are defined at collection level, although it would probably be manageable.

One use case that could make this difficult: let's take an example dataset like EN4. Not all coverages in this dataset have the same parameters, because the more recently-deployed platforms measure more things than the older ones. You can of course simply describe all the parameters at collection level and just have links/identifiers in the individual coverages. But what if, in future, another coverage is added to the collection that contains a parameter that isn't in the "master" list? You could either update the master list (and hope that clients grab the most recent version) or allow the individual coverage to define the new parameters.

I guess the options are:

  1. Put all parameter definitions at the collection level, and just put links/identifiers in the coverages
  2. Allow the individual coverages to contain their own parameter definitions. One would have to be careful to avoid the case where the coverage and the collection contained two different definitions of the same parameter - or you could say that the definition at the coverage level overrides the one in the collection.
  3. Don't put any definitions in the collection level - everything is in the coverages. Probably not very desirable from an efficiency point of view, but simple and unambiguous at least.
  4. Allow parameter definitions to be hosted outside the collection or coverage in a referenced RDF document. This is neat and reusable, but more difficult for clients.
letmaik commented 8 years ago

On 15/03/2016 11:29, Jon Blower wrote:

My gut feeling is that it would be too strict to insist that all parameters are defined at collection level, although it would probably be manageable.

One use case that could make this difficult: let's take an example dataset like EN4. Not all coverages in this dataset have the same parameters, because the more recently-deployed platforms measure more things than the older ones. You can of course simply describe all the parameters at collection level and just have links/identifiers in the individual coverages. But what if, in future, another coverage is added to the collection that contains a parameter that isn't in the "master" list? You could either update the master list (and hope that clients grab the most recent version) or allow the individual coverage to define the new parameters.

I think you're overcomplicating this a bit. There's no such thing as a "master" list which lives outside a CovJSON document, and because of that, clients always have access to the current set of parameters, since they are embedded. If a client actually assumes that those don't change and because of that hard-codes things like parameter identifiers, well, that's the fault of the client. The only exception where this would be ok is if the data producer declares the data as final/archived/.. meaning it won't change.

I guess the options are:

  1. Put all parameter definitions at the collection level, and just put links/identifiers in the coverages

My preferred one, although it gets a bit tricky when thinking about collection paging (which is not defined in CovJSON itself, but as part of server APIs). From a client point of view, it would be best if the client would know all parameters of the logical collection without inspecting the members. In the CovJSON spec we can only go down to document level, which is equal to a page in a paged collection, so we couldn't really enforce this without additional machinery. Of course there may also be some overhead in repeating the parameters in each page, but I think this is not really an issue since the majority of data will typically be the domain and/or range.

  1. Allow the individual coverages to contain their own parameter definitions. One would have to be careful to avoid the case where the coverage and the collection contained two different definitions of the same parameter - or you could say that the definition at the coverage level overrides the one in the collection.

About duplicates, I would say duplicate definitions are forbidden, for simplicity on client side. And by that I mean parameters with the same object key or same ID/URI. The problem with having parameters inside coverages in a collection is that it is way harder to process the collection, for example, just the simple use case of create a parameter selector in a drop down / layer selector.

  1. Don't put any definitions in the collection level - everything is in the coverages. Probably not very desirable from an efficiency point of view, but simple and unambiguous at least.

Exactly, not efficient enough.

  1. Allow parameter definitions to be hosted outside the collection or coverage in a referenced RDF document. This is neat and reusable, but more difficult for clients.

What's the gain of that? It doesn't solve the issue who is allowed to reference which parameters from where, it just moves the problem.

jonblower commented 8 years ago

Just to clarify, by “master list” I mean the list of parameter definitions defined in the collection document. If a coverage was added to a collection, and that coverage contains a “new” parameter that wasn’t previously defined in the collection document, then the client would need to re-download the collection document in order to get the definition of the new parameter. It might be a relatively unusual case, but it’s another instance of cache consistency. The parameter definitions in the collection are effectively a cache, which could go out of sync with the individual coverages if we’re not careful. That’s my main worry with the idea of forcing all parameter definitions to be at collection level.

letmaik commented 8 years ago

Am 16.03.2016 um 11:03 schrieb Jon Blower:

Just to clarify, by “master list” I mean the list of parameter definitions defined in the collection document. If a coverage was added to a collection, and that coverage contains a “new” parameter that wasn’t previously defined in the collection document, then the client would need to re-download the collection document in order to get the definition of the new parameter. I don't understand where a re-download may happen. If the coverage is served as part of a collection document, then the parameters are included in the same document, whether at collection level or inside a coverage. If the coverage is also served on its own then it again has all its parameters in the coverage document and is independent of the collection document, so no need to re-download anything, since the moment you discover a new coverage in a collection document (which by itself is already the re-download), this document contains the parameter definition. That's the whole point of covjson in the end, to make every resource self-describing and self-contained. The only time we slightly break this rule is when we allow the domain or range objects to be referenced by URL.

So what I like to define is first of all what it means when parameters are defined at collection level (e.g. whether all coverages must have all parameters -> no), and then possibly define a collection profile which guarantees that no parameters are defined at coverage level. For a paged collection served from a REST API, this could then be taken further to mean that any page must contain all parameters of the whole collection.

jonblower commented 8 years ago

Ah, sorry, I misunderstood. I thought that the collection definition and the individual coverages might be in separate documents. It makes perfect sense if all resources are entirely self-describing. (Now I understand what you were saying about paging...)

I think it makes sense to specify that, if the document is a Collection, all parameters for coverages in that document are specified at the Collection level. I don't think this even needs to be a profile, I think it's sensible behaviour overall, unless it makes client APIs more complicated (because they will have to deal with individual coverages with "inline" parameter definitions and coverages-in-collections where the parameters are elsewhere).

For a paged collection, I think the page must contain all the parameters for the coverages on that page (not necessarily all the coverages in the collection) - does that make sense? Of course, the page could contain more parameters than are needed on that page, if it's simpler for servers to implement that.

letmaik commented 8 years ago

OK, I think having parameters only at the root level is actually more convenient sometimes for clients, since then they can always do covdata.parameters, no matter whether covdata is a coverage or a collection. And in the JS API I simply copy the parameters object over to each coverage so that a client can easily handle an individual coverage as well, you could regard that as an "expanded" style vs "compact" style.

One problem I see though is that it makes server implementations or publishing workflows harder in some cases, because when serving a collection of arbitrary coverages (think an aggregator/search engine over several CovJSON endpoints) you can't just blindly move all coverage parameters to the root, since there may be coverages with parameters of the same key (e.g. TMP) but being actually different ones (one in kelvin, the other in Celsius). This is why I thought of a collection profile, but not sure if this is the best way.

Another possibly better idea: We could say that if collection parameters are defined at root level, then the individual coverages must not have any parameters defined themselves. Otherwise, all parameters are defined in each coverage individually. This would make it easy to detect the two common cases (all root vs all inline) in clients, and would allow some flexibility for servers. And it would also not require a profile which people may forget to put in or that clients would have to locate etc.

Note that we could have the same discussion for the collection root level "referencing" object which probably should behave the same as parameters.

My goal is to make serving and using uniform collections easy, while still allowing non-uniform ones, with a bit more effort in clients, and a bit more network traffic.

jonblower commented 8 years ago

We could say that if collection parameters are defined at root level, then the individual coverages must not have any parameters defined themselves. Otherwise, all parameters are defined in each coverage individually.

Yes, I think this is a good approach. By the way, I don't think that "serving a collection of arbitrary coverages" should be a use case. I think that a collection of coverages should be related in some way, i.e. it should be logically considered as a "dataset". But perhaps you can think of a useful use case for this?

Note that we could have the same discussion for the collection root level "referencing" object which probably should behave the same as parameters.

Yes, makes sense.

letmaik commented 2 years ago

I think we were overthinking this a bit too much. The spec as it is now behaves like CIS, where the ranges define the actual available values of the coverage. The metadata is looked up in the corresponding parameters in scope, whether that's within the coverage or in an outer collection. And it's ok if the collection or even coverage contains more metadata then what exists in the actual ranges of individual coverages. As long as it's in scope, it's fine. Duplicates are bad, but the spec prefers the coverage over the collection, so that's ok.

I'm going to close this as I think it's not an actual issue and implementations will have to handle those cases.