AWS RDS Data API integration

kstro21 commented 3 years ago

This a feature request, I would like to see Ebean able to work with the AWS RDS Data API. The Data API doesn't require a persistent connection to the DB. Instead, it provides a secure HTTP endpoint and integration with AWS SDKs. You can use the endpoint to run SQL statements without managing connections.

I think this would be a good enhancement to the ORM.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html

rbygrave commented 3 years ago

So firstly anyone can fork Ebean and do whatever they like in that respect so the question is more is this something we want to see get support into master branch and be maintained officially in that sense.

The Data API doesn't require a persistent connection to the DB. Instead, it provides a secure HTTP endpoint

Yes and this is effectively the question. Is it worthwhile to support a proprietary API (albeit from the largest cloud provider) that replaces JDBC with a HTTP(s) api given that we are Ebean and have our DataSource implementation (which I think is the best available).

Q: Does this really improve the situation of of cold starts and time to first response for Lambda?
Q: Do we really have a problem managing persistent connections to the database?

If we look at replacing say Ebean's DataSource implementation (TCP socket connections with keepalive) with a HTTP(s) Client (TCP socket connections with keepalive) what are the pro's and con's. I wrote Ebean's DataSource implementation. It will by default initialise 2 connections on startup and will then grow on demand to it's max size, shrink as connections go idle and reset/re-heal itself automatically. Having used this with AWS Lambda connecting to Postgres RDS I really didn't have any issues here and technically I don't see a benefit to going to HTTP(s) TCP connections.

I can see this API providing benefits for people not using our Ebean DataSource implementation - For example, a pool that initialises too many connections or doesn't self heal or does not automatically manage it's size.

JDBC is also a highly optimised binary protocol and gives us client side prepared statement caching. That would occur on the server side of this AWS API but my thought is that it is very unlikely that this AWS API is more efficient than JDBC and we likely lose performance from the Lambda client perspective. For Postgres we also support JDBC specific types (UUID, INET, CID, ARRAY, HSTORE, JSONB etc) where this API looks less functional in ways I don't really like.

In terms of cold start times I've personally looked at this issue quite a lot over the years. Creating 2 TCP sockets (default min connection pool size) didn't really figure but instead dynamic proxies, classpath scanning and excessive reflection stood out in my testing and experimentation [and one of the reasons I've personally moved away from libraries that do these things]. Also noting that the Java runtime has got a lot faster [don't go off Java 8 runtime wrt cold start].

So in short, I don't see the benefits of Ebean supporting this API. Happy to hear other opinions on the technical merits.

Also note that Ebean is now considered mature and at it's conceptual maximum. That means that any new features have to justify any additional complexity that they might add - the barrier to add "new things" is relatively high.

Cheers, Rob.

kstro21 commented 3 years ago

Hey, @rbygrave, thanks for your response

So firstly anyone can fork Ebean and do whatever they like in that respect so the question is more is this something we want to see get support into master branch and be maintained officially in that sense.

but you didn't need to be rude.

Congratulations on writing the best DataSource available, but I didn't mean to hurt your feelings suggesting a change on the best DataSource available. Sorry for that.

Anyway, good job team with Ebean. People, stay away from Rob, he is an egotistical and toxic person.

rbygrave commented 3 years ago

Hi @kstro21 , sorry I didn't communicate well or clearly there. I'll try to explain more and be clearer.

DataSource

Thanks. Note that adopting this AWS API isn't a matter of swapping the DataSource but probably impacts maybe half the internals of Ebean itself. That is, I'd suggest this would be a very big impactful change because the internals of Ebean use the JDBC API. The size and complexity of this change would be based on the difference between this API and JDBC. For example, all the ScalarTypes know how to bind and read via JDBC pretty much [well, a very thin abstraction of PreparedStatement and ResultSet].

That is, I believe this would involve a very big change to Ebean internals - I think it's a really big job and likely would result in some complexity.

can fork Ebean and do whatever they like

What I'm trying to say here is that personally I'm not yet motivated to do this experiment myself as at this point, I don't yet see technical benefits out weighing the complexity. If someone takes on this experiment, what I really don't want to see is for someone to do a whole heap of work to get this all working and prepare a PR to merge it to master for me to then have to say "sorry, this PR increases the complexity of Ebean internals too much, as such it represents too big a risk for us to merge it into master".

That is, it would be absolutely gutting for someone to do all that work thinking they could just merge it into master to then be disappointed at the last step.

I want people to be fully aware that there is that risk before they start the work. I want to be up front and say that I think this is likely to be a very big change to Ebean internals and has a big chance of not getting merged into master (because it is too big and complex and does not justify it's conceptual weight).

At alternate view of this change would be to go down the route of seeing it like Ebean does ElasticSearch - as a document store. I don't think this is a good fit though as with ElasticSearch Ebean is document orientated sending JSON documents for insert, update, delete etc and this API does not look particularly document orientated to me as it includes SQL and SQL generation. So I still think this is effectively an API to replace JDBC rather than a "document store api".

DataSource

With Lambda people frequently mention the cost of TCP socket connections for cold start and I'm trying to explicitly say that I question that. We have similar questions raised with kubernetes and scaling up pods. That is, I am saying that it is questionable to go from JDBC TPC socket connections to HTTP(s) TCP socket connections and say that is a win/benefit.

My comments around the DataSource are really trying to say that people using something not on the JVM (Python or JS ... ) might be motivated to use this API by a desire not to "manage connections" meaning that they don't have a good DataSource implementation to choose. On the JVM we have some good DataSource implementations and if someone was writing a Java Lambda and struggling with the aspect of speed of cold start initializing TPC socket connections then they should look at the min connection pool size and the behavior of the pool [and I'd recommend they try using Ebean's DataSource implementation].

Apologies for not communicating that well before. Hopefully I've made things a little clearer. If there are still things I need to clarify / improve on here around this please feel free to point that out.

If people are really excited and motivated by this AWS API I'm happy to hear what you really like and what your thoughts are.

Cheers, Rob.

rbygrave commented 3 years ago

Hi @kstro21 I think we can close this issue for the moment. If someone wants to add more to this issue and reopen the issue by all means.

Cheers, Rob.

ebean-orm / ebean

AWS RDS Data API integration #2192