Database Views as API in TR1621

kjetilhamre commented 4 years ago

Hi! I have a simple question regarding the use of DATABASE VIEW API instead of a REST API. Now, I have been told that according to governing documents in Equinor we are required to use REST APIs where each table is exposed via separate endpoints instead of database views. I cannot understand this being the intention with the statement being made in TR1621 section 2.3. I think it is a misunderstanding to say that a database view is not an API. A database view can be supported as an API independently of the underlying database, just as a REST API. The difference is that a database view is way more powerful.

Let me illustrate with an example: Say I have two tables in a remote database: Cars A and Persons B, with millions of rows each, too much data to download. Now, say I want to have all rows from table A with cars, where the owners work at Equinor. BUT, one cannot see from Table A that a person works in Equinor, so first a query in table B must be made to search by employer and find say 8000 people. These are the owners. Now, find the cars. I have to filter the cars table one by one for each of those 8000 people. If this has to be done one by one through a REST API endpoint that takes 151ms to respond. This will take quite some time, actually 20 minutes.

Database joins on a view would solve this. A database join would allow me to join the tables together and return the result in one query. It would perform, less error prone and way easier to maintain.

In my specific case, I am asked to work with the operations and maintenance data REST APIs at api.equinor.com. This interface has endpoints one-to-one with table views in an Omnia COPY of the on premise SAP database with millions of rows in each table. Since it is only a copy of the original database – all operations are limited to read only. However, as I am not working ON the omnia team, I am told that I am not allowed to make database connections from my backend app directly to the database, but must use the API. (I have a backend app that is serving web-clients) In my case I need to join, not just two tables as in the example above, but four, and I have not one, but several such queries.

One solution is to ask a remote team in a separate location to create custom endpoints. However this is going to demand a lot in meetings and coordination. If instead, the existing database view could be supported as an API we would have a flexible, yet powerful integration point. A view is decoupled from database tables and can be versioned just like a REST API. The general structure of SAP does not change either. I talked to colleagues with experience in similar integration patterns and everyone I have asked so far recommends using database views. I have searched the net, and everywhere I have seen a rest API being recommended, it is for a simpler problems with small amounts of data, or different problems altogether. So, my question is if we can include database views as APIs in TR1621. I really cannot see any reason why not. regards, K

oyron commented 4 years ago

Thank you for raising an interesting discussion: I’ll share my thoughts on this topic.

What you have been told - “...according to governing documents in Equinor we are required to use REST APIs where each table is exposed via separate endpoints..” - is not correct. Governing documents have no such statements about REST. However, the Technology Lifecycle contains a classification of API mechanisms. Several of those are listed as now/green (REST, GraphQL, MQTT, AMQP, etc). And then the API strategy adds to this by stating that REST is preferred (except for industrial IT). But REST is not the best option in all cases. Then it is fine to use any of the other green/now mechanisms. And in some (rare) cases a careful/yellow technology might be the best choice (requiring an approval in an architecture contract).

TR1621 states that all IT solutions shall collaborate only by using APIs. One of the main reason for this requirement is that we want to avoid tight coupling between our systems. These APIs provide clearly defined interfaces that are independent of the underlying implementation. The APIs then acts as an abstraction layer that enables changing or replacing the system behind, without affecting the clients. To achieve proper decoupling, the API mechanism should be

technology and vendor independent
support versioning
support specifying the interface through a contract/specification language.

Adding security considerations, it should

not pose any kind of security risk
support industry standard auth protocols
support automated end point testing
support logging/monitoring of API usage
...

Taking the API strategy into consideration, an API should

not expose any inner working, implementation details, or naming schemes. I.e. not expose tables/views directly, but expose endpoints using a model and domain language making sense to the client developer.
preferably listed in the API portal (currently only supporting REST)
be prepared for externalization, ref https://github.com/equinor/api-strategy/blob/master/docs/strategy.md#externalize
...

There are several communication mechanisms that can be considered to be an API and support this approach, at least to some extent. Without being an expert on everything you can do with a database view, I believe that db views does not have the properties that makes it suitable as a widespread API mechanism in Equinor. If we are to add it to the Technology Lifecycle, I would argue yellow/careful. There might be cases where database view is the best alternative, but then it should be approved in an architecture contract.

In your example, I agree that REST is not suited for joining large dataset across multiple endpoints. In such cases you could look into getting a custom REST endpoint created, like you mention. If the API development team is following the “API As a Product” principle, they will pay attention to your feedback. Or a better approach might be to propose to them to create a GraphQL API that can provide greater flexibility.

lenehoff commented 4 years ago

As new product owner rep for the new Plant data team, this is an area where we want to contribute in the discussions. As we are in the start of our new team, we are not all fully ready yet. But definitely an important area for us. The discussions should also be combined with the ongoing testing of new emerging technologies in Azure, like Synapse, that could be more in accordance with guidelines then direct access to database views.

oyron commented 4 years ago

Please share your reflections here, @lenehoff , as this is where the discussion is currently taking place :)

jokva commented 4 years ago

not expose any inner working, implementation details, or naming schemes. I.e. not expose tables/views directly, but expose endpoints using a model and domain language making sense to the client developer.

I do not agree with tables and views being equivalent in terms of being "inner working" or "implementation details. A view is specifically neither, and maintained the same way any other API is.

I see no good reason why a well-implemented, well-maintained view should not be considered an API, especially when GraphQL can be.

edit: besides, how much more "industry standard" thank SQL can you be?

niand commented 4 years ago

For this specific case I think the root cause is that system to system integration is done via a copy of the data, and that the copy is accessed through an web API maintained by a third party team. It seems a better approach to use the API provided by the source system directly (SAP HANA/CDS views was suggested) since it also provides the modelling/joining capabilities required by the consumer.

niand commented 4 years ago

In general I think database views is a poor mechanism for system to system integration because it is harder to trace and monitor and typically has stronger technology dependencies than e.g. a web/json API. There will probably always be some exceptions though. For end-users doing reads and reporting systems database views / SQL is a perfectly fine interface, because tracability and technological bindings matter less in that case.

lenehoff commented 4 years ago

The current technologies approved as API in Equinor are great alternatives for web apps and other system-to-system connection with a programming interface. But it has shown not be a good solution for the increasing amount of dashboard solutions, like PowerBI, that is more and more utilized within the company. Our Data strategy with the data principle 1 states that we want to empower users to actively use the data with the capabilities they have, utilizing the access method of choice. Currently we limit this dramatically with only supporting programmatically API’s like REST. For these users the API’s as SQL connection would be beneficially as this is well supported in the typically dashboards solutions they use. This, of course, means that this needs to be done in a proper way with the requirements mentioned in Øyvind Rønne’s note. These dashboards users are the main case, I’m thinking about when considering SQL connections as API. There might also be developers needing more performant joins in their app’s as the current REST API does not support the needs (like described by Kjetil Hamre above). This is another difficult area’s as this would typically be an optimized API, and not necessarily an enterprise API. But current discussions around optimized API are discussed as being on top of enterprise API’s. But this is not the best performant way, as described by Kjetil. Here an access to the actual database would give a dramatically better performant result. Of this reason, we need to consider this as well continuing the work on enterprise architecture. Within IOC we have had great success with utilizing the GITHub for all database script, maintaining the versioning and handling broken relations across database tables and views. This is relevant with regards to several of the mentioned key decoupling mechanisms: • Technology and vendor independent. Utilizing standard SQL scripts, possible to establish within various database technologies and vendors • Versioning. Can do the same with database scripts. Integrating as part of the DevOps pipeline • Testing. Automating the tests of the database schema changes as part of DevOps pipelines • Security. There are several alternatives with regards to security and access management. Only database views should be made available for a user, hiding the inner working/implementation/naming. We should assess the best practises on how to perform these kinds of security and role management. • Most database technologies support advanced monitoring, and this should also be assessed and given guidelines. One solution further could be to utilize the same set of database views both for SQL connections and for establishing REST API’s. The establishing of performant and fine-tuned database views as the basis of REST API’s is heavily used already, why not giving users access to the same database views as behind REST API. This way we can utilize the same documentation.

oyron commented 4 years ago

Just to make sure there are no misunderstandings - "general SQL connections as API" is not recommended. This would create a tight coupling between the database and the client that we don't want. This discussion is focusing on "database view as API", where the client only has access to the view.

I agree @jokva and @lenehoff that it is possible to create a well crafted database view addressing many of the considerations I mention. But I still have several concerns regarding widespread usage of this pattern. To name a few, I think the lack of an explicit contract makes it harder to collaborate on developing the API and doing API first. And you don't have access to tools for generating documentation, mock code, etc. I also think it requires discipline to create the database view as a proper abstraction layer, and that this discipline will likely be lost in "the heat of the moment" a few places. And I am concerned about security - how do we monitor the API for unusual behaviour, how do we do automated end point testing, how do we do DoS prevention / rate limiting, etc.

Please note that if we classify "database view as API" as yellow/careful in TLC, it does not mean it is not allowed to use. It means that you have to justify the choice and get this approved in the architecture contract.

@larskaare, do you have some reflections to share?

jokva commented 4 years ago

I think the lack of an explicit contract makes it harder to collaborate on developing the API and doing API first.

Hang on, what do you mean explicit contract? @oyron

oyron commented 4 years ago

I mean an explicit interface definition (i.e. contract) separated from the implementing code, using a specification language like OpenAPI or AsyncAPI. Take a look at the API First section in the API Strategy.

larskaare commented 4 years ago

My reflections in this context would be related to security. I assume that however you choose to interface a database, the communication/transport should be encrypted. When it comes to authentication and authorization i assume we are a bit dependent on the actual database and the hosting platform. If on Azure using services such as MS Sql Server and accessing the database as a machine we should use managed identities, and when as a person some kind of passthrough (which i think implements oauth2) - everything governed and handled by our Azure AD. These are usually good patterns. Ani patterns are things like creating user/passwords/key in the databases and use these for auth in the clients, using the admin account for access++. I also assume that enough effort is put into the design and securing of the views that clients are only able to query the views that should - and not be able to expose the underlaying tables directly though the views. The same goes for securing the database when access is given through and sdk/api - ensure that clients only can access stuff they should explicitly be able to access. (When I access a database though and SDK provided by the vendor, ODBC or what not - it usually have access to a lot of interesting functionality like system views+++ which potentially increases the blast radius from a security point of view. Accessing the data using a rest api could potentially reduce some security risks)

jokva commented 4 years ago

I mean an explicit interface definition (i.e. contract) separated from the implementing code, using a specification language like OpenAPI or AsyncAPI.

The view itself is a pretty explicit contract, and independent of the underlying implementation.

oyron commented 4 years ago

I don't see it the same way, but I don't consider this a key point in the discussion.

jokva commented 4 years ago

Then what is the key point?

technology and vendor independent

ODBC is fairly independent, for example.

support versioning

Most http-based APIs do versioning with a v{n} in the URL, i.e. embed it in the address. This is perfectly doable with views too.

support specifying the interface through a contract/specification language.

Relations are about as specific of a contract as you can get.

not pose any kind of security risk

Why would a view introduce any more risk than any other interface? Done poorly, maybe, but that applies to all other interfaces too.

support industry standard auth protocols

Sure, and people have authed against database servers forever, and through agents if you will. You can run the requests through http if you will (at least on postgres), and put whatever auth in front of that.

support automated end point testing

You get a pretty strong backing by definition, but this should be no worse than any other technique.

support logging/monitoring of API usage

Databases are frequently monitored.

As far as I can tell, views can certainly support every requirement presented so far. In my opinion it can be a perfectly good API. They have to be designed, implemented and supported well, of course, but it's quite easy to make terrible REST services too.

oyron commented 4 years ago

The key point to me is wether a database view is suited as a widespread API mechanism in Equinor (and not wether a database view should be considered an "explicit contract").

And I still think it should not be used widely. I think database views are less suitable for the API first approach, due to lack of specification language with tooling support. It is not completely technology independent. You cannot replace the underlying data store with something that is not ODBC compliant, like a a document or time series database. I believe it requires more discipline to create the view as a proper abstraction layer and do "API First" and "API as a product". With widespread usage, I think it is likely that there would be many cases of views simply mirroring internal naming schemes/structure. And I am concerned about the security aspect. Can we do rate limiting and DoS protection? How do we monitor for security events? And I think the impact of broken auth could be much higher than in a REST API.

I think we just have to bring out an old cliche and agree to disagree.

jamcc-equinor commented 4 years ago

Going back to the original post - I found this part most insightful:

In my specific case, I am asked to work with the operations and maintenance data REST APIs at api.equinor.com. This interface has endpoints one-to-one with table views in an Omnia COPY of the on premise SAP database with millions of rows in each table. Since it is only a copy of the original database – all operations are limited to read only. However, as I am not working ON the omnia team, I am told that I am not allowed to make database connections from my backend app directly to the database, but must use the API.

This sounds more a data architecture and API design (and possibly also organisational) issue than a specific tools/technology issue.

At the risk of upsetting the people involved in creating this solution - it appears the APIs you are being told to use are not suited to your (and possibly other people's) needs.

Depending on the technologies you are using for your app and web frontends, it may be that more appropriately designed REST APIs could suit your needs (or possibly something like GraphQL would be a better solution to give more flexibility). But the key problem is that the data products themselves are not designed for what you need them to do. I think the issue is less with the choice of API technology and more with the design of the data products that these APIs are serving up.

I support what @oyron has said above about the reasons for the API first strategy - and building loosely coupled solutions is something we should definitely strive for. Serving only individual table views and then requiring the user to join data themselves means that the logic for joins needs to be implemented in many solutions, and the more implementations the more risks of errors creeping in - as well as more places would need to be updated in the case when database table structures change.

Specifically on SQL views as an appropriate abstraction layer in a well-architected system - I do think this will need to be part of our toolbox in a controlled manner - most obviously for supporting ad-hoc BI - because many of today's tools (PowerBI, Spotfire etc) are built for SQL Connect, and because SQL is a powerful, flexible way to interact with data. However, I don't think it's the best way to share fixed payloads of data between in-house developed applications - so if the available APIs were well designed for enterprise use, the API strategy should still stand for all internally developed apps except in the cases where the app being developed is basically in itself a BI tool to deliver ad-hoc data access. The main question that I hear in this original post is "who should supply the requirements for the design of these data APIs?".

oyron commented 4 years ago

After discussing this with the other leading advisors for software development, I have now added Database view as API in the Technology Lifecycle, classified as careful. The reason for this classification is that we want to allow for this usage in special cases where database view(s) has significant benefits compared to other API mechanisms. But this is not something we want as a widespread API mechanism.

Database views as API must be created as a proper abstraction layer and follow the principles of the API strategy. The view must be properly secured, support versioning, do not expose inner/technical details, usage must be monitored, clients must ONLY be allowed to access the view(s) and not the database directly, etc.

Usage of "careful" technology items must be documented and approved in the architecture contract.

oyron commented 4 years ago

This discussion has come to conclusion. Thank you for your contributions.

equinor / api-strategy

Database Views as API in TR1621 #76