Define URI hash fragment syntax

letmaik commented 8 years ago

Forking off of https://github.com/Reading-eScience-Centre/coveragejson/issues/49#issuecomment-219678005

References:

The idea would be to define a fragment syntax for the CovJSON media type with which a subset/fragment of a coverage (and collection? ndarray?) can be identified. How clients handle that will not be defined, just the semantics of what a fragment means.

Proposal for Coverage documents:

grid.covjson#axis:t=3 -> fourth "t" axis element
grid.covjson#axis:t=0,20&parameter=SALTY -> first 21 "t" axis elements & SALTY param
grid.covjson#axis:x=0,20&axis:y=50,60

The namespace "axis" is necessary since axes can have arbitrary names and it needs to be separated from "parameter" or other future things. The numbers are axes indices, probably inclusive to be less confusing, even if that is against the numpy slicing syntax or similar.

The same could be used as URL query parameters for server-side subsetting if people wanted to use that, e.g.:

grid.covjson?axis:t=0,20&parameter=SALTY
grid.covjson?axis:x=0,20&axis:y=50,60

The parameter parameter could possibly accept multiple comma separated parameters. Not sure if I want to call it in its plural form though. grid.covjson#parameter=SALTY,TMP looks fine to me too.

And mixing both URI fragments and query parameters is then possible as well:

grid.covjson?axis:x=0,20&axis:y=50,60#parameter=SALTY

This could be interpretated as: fetch the x and y subset from the server and display the SALTY parameter on a map, or "select" it, while still providing the other parameters if necessary.

I would say that defining all this in value-space is too complicated since an axis can be composite, or have special string formats. Also, it makes it more tricky to know what exactly the result is, meaning axis sizes etc.

Of course, index-subsetting for Collections doesn't work. So the question would be, do collection fragments need to be defined as well? If so, on what dimensions and how exactly? If this happens on CRS dimensions, then this means domain components for uniform collections, so that would be something like collection.covjson#component:t=2012-01-01T00:00:00Z,2012-01-02T06:00:00Z. But again, value-space is tricky, and encoding as well (have to make sure that if a domain value contains a comma, it has to be escaped).

BillSwirrl commented 8 years ago

can I suggest that you don't use a hash for this in the URLs, noting that everything after the hash is not transmitted to the server - just processed locally by the browser. The URL parameter approach seems a better one.

So if you have grid.covjson#blah then the client has to retrieve all of grid.covjson before it can decide what to do with the hash stuff.

letmaik commented 8 years ago

I know ;) URL parameters are not in scope of such a format spec, I included them just to make a point, but hash fragments are, since those are always media type specific and should be defined by the media type itself.

The point of this is that subsets can still be identified even if they are not published as such. And especially if a coverage is published with pre-generated range tiles, then the first server request would only return a very small document, and after that the client may fetch the relevant tiles associated to the hash fragment subset information.

jonblower commented 8 years ago

Yes - the URL in this case is an identifier, not an API call. This has some advantages, in particular it shows that you don't necessarily need a dynamic server (like a WCS or OPeNDAP) to create or use subset identifiers.

BillSwirrl commented 8 years ago

sorry - I should have guessed you were a couple of steps ahead of me :-)

but the question of identifying versus retrieving is an important one and it would be good to be explicit about it. Part of the reason for identifying extracts/subsets of a coverage is to be able to retrieve only the bit you are interested in. I suppose the other part is to be able to assign attributes to that extract.

Some form of two-step retrieval, whether tiled or not, sounds interesting - eg defeferencing the coverage (or coverage extract) identifier provides some metadata and info on how you can get the real data. We do something similar with linked data datasets for statistics.

jonblower commented 8 years ago

Yes, you could just get the domain of the coverage, look at the available subsets and decide on your best strategy for getting the data. Of course a web API that allows arbitrary subsetting would be useful here (e.g. WCS or Maik's experimental REST API), and it's interesting to consider how a client might automatically discover the existence of this.

(And by the way, if the client discovers an OPeNDAP server, it's probably very easy to translate the "hash fragment syntax" into an OPeNDAP API call, because OPeNDAP also operates on array indices.)

letmaik commented 8 years ago

HTTP HEAD requests? But yes, I'm thinking along the same lines. So, back to the syntax!

letmaik commented 2 years ago

I think we'll not find a sensible syntax here that works for all CovJSON object types, and it's very likely that each web application has it's own URL scheme on how to drive these things. I'm closing this as it can always be defined later on, if someone really wants it.

covjson / specification

Define URI hash fragment syntax #66