Open AntonHardock opened 2 years ago
This is an interesting direction to think about. As you noticed in #32, we have had some tentative thoughts about introducing an organization level between "service" and "collection". it sounds like you have exactly this need.
Can you provide more detail of your data model/schema? Is it the case that your "datasets" with multiple collections map to database schemas and the tables within them?
The Features Core standard states:
Additional capabilities that address more advanced needs will be specified in additional parts. Examples include support for creating and modifying ... multiple datasets and collection hierarchies.
So it looks like the standard has not yet be extended to handle "multiple datasets", correct? It would be very preferable to follow the OGC lead on this, since it is a very small design space, and there is a large risk of making the wrong choice of direction, winding up out of alignment with the standard, and then having to change design with an impact on current users.
Would an alternative (in short or long term) be to implement a thin front-end which can map your desired URL structure into requests which are supported by pg_featureserv
? It could also deal with providing dataset-level metadata.
Our data model follows this pattern: For each dataset we receive from our customers, a new postgres schema is created. Often, the dataset is split into multiple tables/views. Then, each dataset is published as individual OAF endpoint, where tables/views are mapped as collections of that endpoint. The same structure is reused for other API Types (like WFS and WMS)
In our implementation, we follow the example of ldproxy. The software is an OGC reference implementation for Features Core. Here's an example: https://demo.ldproxy.net --> lists available datasets https://demo.ldproxy.net/daraa --> example dataset from the OGC Testbed-15 https://demo.ldproxy.net/daraa/collections --> list of available collections of dataset "daraa" https://demo.ldproxy.net/daraa/collections/IndustrySrf --> example collection: industry surfaces in Daraa
Nonetheless, as you point out correctly: Features Core explicitly does not cover how to handle multiple datasets. Standardization begins at the collections level of one dataset. At the same time, an extension is not discouraged:
Other parts of this standard may define API extensions that support multiple datasets. The statement that the features are from "a dataset" is not meant to preclude such extensions. It just reflects that this document does not specify how the API publishes features or other spatial data from multiple datasets.
The current outline for potential OAF Parts to follow does not seem to include such an extension. Looking at ldproxy, one could argue that there is no need for that. Listing available datasets at the entrypoint and providing a landing page for each seems to be a straight forward solution. I think that this is in line with your proposal of a "thin front-end".
Would it be possible to add such a front end through an optional "multi-dataset mode"? If users want to publish multiple datasets through one instance of pg_featureserv, the software could
Would it be possible to add such a front end through an optional "multi-dataset mode"?
Yes, this is certainly possible. It makes sense to make this an option controlled by a config parameter.
The request structure you suggest (schema/collections/collname
) makes good sense. I've updated #32 to reflect this.
the software could use an extra config file mapping the desired URL structure to collections requests
I'm not clear what this means, or why an extra config file is needed? isn't the URL structure mentioned above sufficient?
- pull that extra mapping from another postgres table, alongside with metadata (like dataset descriptions)
I can see it might be useful to have more metadata in the database for use in the UI. But up to now we have avoided adding metadata tables in the database. Could the metadata UI be provided by an external service, with pg_featureserv
just providing the UI as it exists currently (for each dataset/schema)?
The request structure you suggest (schema/collections/collname) makes good sense. I've updated https://github.com/CrunchyData/pg_featureserv/issues/32 to reflect this.
Thank you very much! As to your questions: I realize the following bit was misleading:
If users want to publish multiple datasets through one instance of pg_featureserv, the software could
- treat each schema as separate dataset, as proposed initially
- use an extra config file mapping the desired URL structure to collections requests
- pull that extra mapping from another postgres table, alongside with metadata (like dataset descriptions)
Each bullet point is supposed to represent an option towards the same goal. The first suggestion (schema/collections/collname) is perfectly sufficient. In fact, I'd consider it the best option as it suits the (almost) zero-configuration nature of pg_featureserv. However, given the usecase, extending that basic idea might still be nesseccary.
First, let me outline that usecase further:
The question then becomes: What should be presented on that OAF endpoint?
A: Just a plain list of available collection links That would certainly do. After all, the metadata-catalogue already serves as a "dataset homepage". It thereby fulfills the role of an external metadata UI.
B: list collection links + optional links and metadata Following linked data principles, linking back to the metadata catalogue seems ideal. Rendering that link by pg_featureserv, along with available collections, seems straight-forward. Instead of additional config files or cluttering the global config, one could map schemas and links in an extra config table. While at it, extra metadata could be rendered. Though redundant, any bit of extra context helps users navigate our complex infrastructure. At least that is our experience so far. Fields in that config table might be: _schema_name | metadata_url | dataset_fulltitle | datasetdescription
While I prefer B, I just might be stuck in old thought-patterns.
You suggested an external service to tie all that information together. Could you eloborate on this, outlining the implementation and potential advantages?
Another line of thought: Suppose the 1 schema : 1 dataset pattern can't be enforced, for whatever reason. The most flexible approach then would be to let administrators define what constitutes a single dataset. Doing so through config tables might be a reasonable solution. Using the optional "mutli-dataset mode", admins could start filling the following table: _dataset_shorttitle | metadata_url | dataset_fulltitle | datasetdescription
An overview of available datasets is then provided by: _oafbaseurl/datasets/ For increased readability, this page could list full dataset titles (if present, else shorttitles). One dataset might be named schools (shorttitle). The corresponding OAF landingpage becomes: _oafbaseurl/datasets/schools/
Collections belonging to a dataset would be rendered from a table like this: _dataset_shorttitle | collection_shorttitle | collection_fulltitle | datasource (schema.table) Here's the example "schools" with collections stored across multiple schemas: _schools | middleschools | Middle Schools | middleschools.tableA _schools | highschools | High Schools | highschools.tableA E.g. the URL leading to collection "High Schools" becomes: oaf_baseurl/datasets/schools/high_schools
Added benefit: Schema names remain hidden. (No concern in our setup, but it might be important in other contexts)
Each bullet point is supposed to represent an option towards the same goal.
Got it, that makes sense now.
The first suggestion (schema/collections/collname) is perfectly sufficient. In fact, I'd consider it the best option as it suits the (almost) zero-configuration nature of pg_featureserv.
Excellent, and agreed that zero-configuration is what we are aiming for.
You suggested an external service to tie all that information together. Could you eloborate on this, outlining the implementation and potential advantages?
It's possible to implement another service which provides the front-end to pg_featureserv
queries. If there is a separate metadata repository it could be populated with links to the queries. Or, since pg_featureserv
is simply serving the database catalog, the external service could access the same catalog.
Another option which is supported is to customize the pg_featureserv
HTML templates. This would allow adding a link back to the metadata service (as you propose). With some web scripting it should be possible to inject any desired additional information and UI into the web pages served from the templates.
In the past weeks, my colleagues and I further discussed our deployment strategy. We also had a chance to speak to Clemens Portele. He confirmed that as of now, the OGC-API family centers around individual datasets. This also means that all resources related to a given dataset (features, tiles, styles, and even metadata records) shall be available from one and the same dataset endpoint. When implementing those resources as isolated microservices, a decoupled frontend service appears to be the most adequate solution.
For "traditional" deplyoment though, I believe that an optional "multi-dataset mode" would be of great benefit. Same goes for metadata links, as recommended by the Spec (Rec 9, "describedBy"). This refers to the collections level, but can easily be extended to the datasets level. At least for the federal geodata providers in Germany, both requirements are very important. Having them "out the box" could greatly facilitate the adoption of pg_featureserv (and OAF in general).
Anyway, thank you so much for the thorough discussion, that was incredibly helpful! One more thing: Are there any current plans for a new pg_featureserv version? Maybe an outline of open issues / requests that are considered for the next release?
Dear CrunchyData Team,
Are there any plans to change the API structure such that each dataset gets its own endpoint and landingpage?
A similar issue exists. However, I'm not sure if my request fully aligns.
Background The OGC API Features Spec allows or even requires the grouping of collections by datasets. References:
Proposal Currently, pgfeatureserv exposes all spatial tables across all accessed schemas at the collections level: /collections/schemaA.table1 /collections/schemaB.table1_ ...
With "datasets" grouping, each dataset would have its own JSON+HTML representation (potentially filled with some metadata). Likewise, each dataset/collections page would list available collections, followed by the actual collection ids: /datasetA/collections/collection1 /datasetB/collections/collection1 ...
One solution is to equate each schema with one dataset. Pgfeatureserv could then adjust the API sructure to: /schemaA/collections/table1 /schemaB/collections/table1_ ...
Motivation and Use Case I work at the Agency for Geoinformation and Surveying in Hamburg, Germany. Currently, we evaluate Software to expand our OGC API repertroire. Our Urban Data Platform offers OAF Pt.1+2. It is implemented in a monolithic OGC Suite (alongside WFS, WMS and so on)
As we gradually move the Platform to a Cloud environment, pg_featureserv becomes a very exciting alternative. However, we need to link OAF endpoints of individual datasets with their corresponding entries in a metadata catalogue. The latter serves as "dataset homepage", providing links to all APIs from which the dataset is accessible. Linking to multiple collections would be impractical. Since we offer 300+ datasets with 1000+ collections, navigating that would overwhelm end users.
I'm looking forward to hear your thoughts on this. Best Regards,
Anton