Closed sebastian closed 6 years ago
CC: @Aircloak/developers
On the PRO side for HTTP API:
Info messages can be, and are, sent through the psql interface too.
Info messages can be, and are, sent through the psql interface too.
Even so - I don't think things like :*
, user counts, or "row unreliable" can be elegantly handled within the psql protocol.
Those are good points.
I believe, we have two distinct use-cases, and they shouldn't be necessarily solved in the same way:
In this case a client is a developer who writes some code to programmatically issues queries to the Aircloak system.
This could be done through postgresql interface, but it's not been really tested a lot so far. If we opt for postgresql approach, then we need to invest some time in some release cycle. Someone should pick a couple of popular languages, and make some basic tests to see if postgresql interface works for those clients, and if not, then we need to make fixes. This requires extra work, and in my mind brings nothing new to the table compared to REST API. The listed pros of postgresql interface are IMO mostly irrelevant in this scenario.
Therefore, I think that REST API is a stable and a proper solution for this use-case. Since REST API has been in our system for quite awhile, its implementation is more mature and less hacky than the postgresql interface. Hence, I believe REST should be our preferred solution for programmatic access. It's already working today, and it will work for all http clients with no extra effort on our behalf.
This use-case is much more intricate. In many cases, this should be possible with an ODBC driver. However, back when I was evaluating some tools, ODBC integration wasn't supported in all the tools, and Tableau supported custom drivers only on Windows. Moreover, writing an ODBC driver seems like a daunting task. There are companies whose sole job is to write a custom ODBC driver, and they charge an arm and a leg for that.
The postgresql interface was a "trick" which allowed us to integrate with Tableau. It also possibly simplifies writing a custom ODBC driver, since we can start with the source code of the official driver, and customize it to our needs. Nonetheless, I think this will be a very complex task, and it will open up another complex front, as we'll have to troubleshoot problems on end-user machines, and possibly support different operating systems. I'd like to avoid that scenario if possible.
A nice thing is that so far we've been able to integrate with Tableau without needing to write our own driver. This means that our current problems are completely solved in the Elixir code, and Tableau integration works both on Windows and on Mac.
It's also worth mentioning that Aircloak can't work with all the visual tools. For example Qlik fetches all tables to the client machine, and then performs joins inside the client code. Combined with our anonymization properties, this could lead to most of the data being anonymized. Therefore, I think we can only work with tools which offload the query to the database. I think that postgresql interface is a win here, but in the future, we might need to make some adjustments for treating psql specific queries.
In my opinion, this use-case is quite complex, and we'll need to tackle it on a case by case basis. Perhaps the next tool we want to support might work with postgresql interface, or we might need to write our ODBC driver, or we might need to consider a completely different approach.
I also agree with @obrok's points that postgresql interface constrains us in what we can return. In a visual tool, this doesn't matter, but I think having the ability to send Aircloak specific metadata is good for programmatic access.
Since REST API has been in our system for quite awhile, its implementation is more mature and less hacky than the postgresql interface.
It's been in our system a long time, but has seen no real use! I don't think that qualifies as making it more mature.
Hence, I believe REST should be our preferred solution for programmatic access. It's already working today, and it will work for all http clients with no extra effort on our behalf.
But more overhead on the developers side. That's the issue. They need to deserialise values, they need to mangle the JSON etc. It's not at all as easy to use as a Postgres driver they already know how works. My concern is ease of use. I see how much people struggle with using our system as it is. Adding complexity through them having to parse and mangle JSON, and do multiple requests with polling to execute a query and get back the result is everything else than easy to use.
and it will work for all http clients with no extra effort on our behalf.
It's the effort on the users part of I worry about.
Tableau supported custom drivers only on Windows
Now they support macOS as well.
The postgresql interface was a "trick" which allowed us to integrate with Tableau.
It started out as a trick, but in my opinion it turned out to be a brilliant idea.
... ODBC driver, ..., and customize it to our needs. Nonetheless, I think this will be a very complex task
I don't see the need to offer an Aircloak branded ODBC driver at present. There are enough languages and tools supporting Postgres out of the box.
I think that postgresql interface is a win here, but in the future, we might need to make some adjustments for treating psql specific queries.
Adjusting the way in which we treat psql specific queries is something I want us to do in any case. I have the hunch that once we do, the integration with tools like Apache Zeppelin, superset, and redash will basically come for free.
In my opinion, this use-case is quite complex, and we'll need to tackle it on a case by case basis. Perhaps the next tool we want to support might work with postgresql interface, or we might need to write our ODBC driver, or we might need to consider a completely different approach.
I have yet to see a tool we want to integrate with that supports other databases natively, but not Postgres. I think the choice of implementing the Postgres protocol was brilliant and in fact already has gotten us 90% towards an excellent means of using Aircloak.
It's been in our system a long time, but has seen no real use! I don't think that qualifies as making it more mature.
What qualifies REST as more mature is that it has been more thoroughly tested. Also, it has a simpler implementation which doesn't rely on hacks. Finally, unlike with postgresql, clients don't get to assume anything about the internal implementation details.
They need to deserialise values, they need to mangle the JSON etc.
Deserialization is done by the json decoder, and it can be a single LOC, or with smart HTTP clients a zero LOC code change. Past that point, they need to roughly invest the same amount of work as with postgresql client - iterate the structure and do something with the values.
It's not at all as easy to use as a Postgres driver they already know how works.
You seem to imply that every programmer knows how to use PostgreSQL driver, but no programmer knows how to use HTTP client :-)
and do multiple requests with polling to execute a query and get back the result is everything else than easy to use
We could of course support synchronous queries. Keep in mind though that that's a double edged sword. For long running queries, one might experience a netsplit. The async interface still allows clients to pick up the result, while AFAIK postgresql interface doesn't. If you run a query for an hour, and you experience a disconnection after e.g. 55 minutes, you need to restart it from scratch.
It's the effort on the users part of I worry about.
I'm still not convinced that this "effort" you mention is significantly higher (or higher at all), especially given that clients of programmatic API are programmers.
I have yet to see a tool we want to integrate with that supports other databases natively, but not Postgres. I think the choice of implementing the Postgres protocol was brilliant and in fact already has gotten us 90% towards an excellent means of using Aircloak.
I'll be definitely very happy if we can support visual tools with postgres interface. I'm just a bit more cautious than you :-)
Either way, as postgres interface implementation matures, I'd be possibly more confident about using it for programmatic access. However, at the time being I still think that it would be better, both for us and external programmers, if they used REST for programmatic access, and that we should refine the postgres interface only for the purpose of visual tools.
Ah, forgot to add my 2 cents, because I thought it was @sebastian who was talking about the two use cases while it was @sasa1977 (paying an arm and a leg tipped me off, finally). I basically agree with @sasa1977 here - the needs of programmers integrating a custom solution vs. the needs for integrating an off-the-shelf tool should not be conflated, at least not up-front. It would be really nice if we could solve all of this with just one mechanism, but it seems we'd need to scrap the web frontend, to do that.
It's the effort on the users part of I worry about.
This view focuses only on the initial setup. Being accessible to newbies is obviously important, but once the programmer figures out how to issue one query, I think the two methods don't differ that much, and that's where most of the effort will go. The HTTP API has the slight advantage here of being able to match our data and execution model better (metadata, async execution, etc.).
Maybe it's a pity we didn't try to convince the SAP people to try using HTTP - we would have some real-world data on how that goes over :)
Another datapoint. I just got an email from one of the attackers (from the attack challenge). It reads:
If I understand correctly from the user guide I have two options for this: either the REST API or the PostgreSQL Message Protocol Server using a driver of my preference (e.g.Psycopg2 for Python). I personally prefer the second option, thus, could you please let me know the Insights-Air-Host address and the PSQL port?
(emphasis obviously mine)
With release 18.4
I would like the Postgres interface to be considered THE first class means of querying Aircloak.
This means we need some mechanism for answering more of the Postgres specific queries out of the box. This would be solved by our idea to host a Postgres database in Air that can answer schema related queries for us.
The relevant issue is this: https://github.com/Aircloak/aircloak/issues/2130
Last week we started a discussion on Slack about which API based mechanism should be the preferred means of querying Aircloak.
We currently have two programatic means of interacting with the system:
They come with different PROs and CONs.
HTTP REST API
PROs:
CONs:
Postgres Message Protocol interface
PROs:
CONs:
As long as we want to make it possible to query Aircloak through BI-tools like Tableau, there is no getting around needing to support the Postgres Message Protocol interface in some form. The HTTP API will never be a valid option here.
The set of tools we want to support is going to increase, and as we do so, the interface is going to get better as a result of adapting it to accommodate the needs of these tools.
I would rather we embrace the interface and make it work well. That is going to make integrating our system in custom dashboards/programs where they talk to it directly trivial (through out of the box Postgres support), as well as making it easier to add support for further off the shelf dashboards and tools.