[discussion] Migration from Spring to high performance API system

archenroot commented 6 years ago

I suggest to discuss possible migration from Spring bloat to full microservice architecture.

There could be possible move from Spring (development framework) to full microservice platform. Probably most performance one currently rising. Are you guys willing to discuss this option?

I am working with light4j guys and I can help with migration of course. https://github.com/networknt/microservices-framework-benchmark

Note: light4j is not only web service, or db java framework, but fully feature microservice platform with built in Sagas, Event sourcing, CQRS and dozen of other sub-projects. But one is clear, it focus in high performance.

https://github.com/networknt

https://doc.networknt.com/

Ladislav

syjer commented 6 years ago

hi @archenroot

As we are fluent with spring (we will upgrade to the 5.0 release which will improve a bit on the bloat side) and given the amount of code currently existing, it's quite a hard proposition to change the underlying framework without causing regression/bugs.

On the performance side: with our performance tests, the bottleneck is generally the database (and some of our unoptimized query). So we think it's better to improve first on this side than switching framework.

On the memory side, you only need 64mb of heap (~200mb total) to run alf.io (but obviously, a little bit more is better :)), so it's quite small.

What kind of load do you expect that spring could be a performance issue?

archenroot commented 6 years ago

@syjer - I actually just wanted to hear your opinion on this and understand it is overhead. thank you very much.

archenroot commented 6 years ago

@syjer - hi buddy, I by mistake closed the issue not properly read before.

Actually I like Spring and Alf.io work like charm on that, but I personally worked on high performance integration projects where Spring is not the correct partner to go from long term perspective if you would like to prevent costs of big load balanced cluster instances. Still the latency will remain the problem. Maybe it is not a target and aim of Alf.io, but there are better solution for building service architectures. I talk only about web service architectures in the moment.

Second aspect is to fully adopt sagas, CQRS and event sourcing pattern. And that is the place where we talk about modern microservice design. And it is not Spring. There are actually 3 full microservices platform in the wild supporting modern design of building highly scalable systems.

All open source

Evaulate.io by microservice ROCK start Richardson - this is must for anyone interested in microservices - http://microservices.io/ and his platfrom https://github.com/eventuate-local/eventuate-local
Axon - didn't test, but promising
And finally Light4j project which is in general full copy (with more active development ) of Evaulate prototype from Richardson, but leaving heavy frameworks behind and return to Java nature :-) if I can say it this way. And this is the choice of mine for future work on service designs with hpc in mind.

I suggest to start as well here: https://doc.networknt.com/getting-started/

Still I understand it might not be a priority in the moment.

Resource hungry perspective: I have some new service in development, it is also empty, so probably better to show when there are implemented all endpoints:

But again it is also about high performance. Wit regards to database there are actually little bit different design options with CQRS pattern end event sourcing in place. I will try to explain it simply.

You have event store as central store of truth, can be seen as audit pattern implementation or banking transactions on the account. This must be durable db instance or cluster, etc. But the business services (ie. Ticket Service, etc.) are deployed on the fly and actually use REPLAY functionality of event store saying - you event store, give me all events between 2018-01-01 until today. Ticket service than use these event to build so called Query/Read only model. This service database doesn't not maintain actuall record statuses itself, but only is used to physicality different view models on events. If you change your model (tables in database), you destroy the service, redeploy and replay the events to construct different data views. Why I go into this detail is to consider removing the bottleneck of DB with such design:

You can run service database for fast queries in memory or more exactly on linux on tmpfs file system (RAM fs). This will make IO deadly
There exists proof-of-concept for using USC (Unix Domain Socket) instead of TCP to connect to Postgre instance. Removing the network burden (you can as you have local service instance) will again significatnly introduce both troughput and especially latency'

https://github.com/impossibl/pgjdbc-ng/issues/169 https://github.com/jruby/activerecord-jdbc-adapter/issues/677

I will be working on to get the driver during this year.

Messaging You can say that introducing something like event store will introduce some kind of slow down, because you have another actor like: Client -> Command -> ServiceA -> Event -> EventStore -> ServiceB

The default messaging here is Kafka, but I work on even faster and more performant solution which can be also seen as brokerless called Chronicle Engine/Queue/Map, so latencies are gone. Second high performant component is to use for Event store not standard Posgtres, but TimeScaleDB extension which is superior for high volume of inserts. It actually implements so called hypertable which can be seen as dynamic micro partitioning, more can be seen here: http://www.timescale.com/

This is for event store i think the best component in the moment. For local instance which construct the Domain READ/QUERY models from forwarded events I think one of best SQL databases (but mongodb is good fit as well) is pipelineDB. This extension has actually very nice fature of so called CONTINUOUS VIEWS, so it is like streaming data into table in real time and it refreshes automatically and can generate other events (can be used for some websocket notification send to ui/clients): https://www.pipelinedb.com/

And finally deadly messaging layer, it is not yeat supported with network, so long way to go, but kafka or activemq will serve as well very nice. Actually Alf.IO is about ticketing, so instead of using some heavy protocol, maybe MQTT is best choice with broker like mockito.

https://github.com/OpenHFT http://mqtt.org/ http://site.mockito.org/

These can be built into Ligh4j infrastructure, it is not 100 hundred ready, but guys are working on making abstract layer for messaging, so in future any new kind of messaging layer can be injected.

I am willing to donate your project or hire some Java developer to help on these ideas. Let me know if any interest.

Ladislav

syjer commented 6 years ago

hi @archenroot , thank you for the extensive reply :). On an architectural point of view, the approach that you are proposing is for sure the most scalable, but on the other hand, for alf.io we also have additional criterias, one of which is the simplicity of the deploy: and it's one of our core value.

Currently for deploying alf.io, as an additional dependency you only need a single store (postgresql ideally). This is something that we really want to keep as it is. This make the deploy on a multitude of environment (mostly cloud based) simpler, and it's easy to scale up/down (and you don't need sticky sessions on the load balancer when using the correct spring profile).

Obviously this mean that the bottleneck will be postgresql (as spinning up the java instances is easy), but to saturate it, you really need to have some very big number of concurrent requests.

Maybe as a first estimate, it would be better if you would provide the expected number of concurrent transactions, before changing the whole architecure :).

Clearly, we have some room of improvement and we could use your input :)

archenroot commented 6 years ago

@syjer - I totally agree with your points, but just wanted to put on table a future of services and their designs.

we also have additional criterias, one of which is the simplicity of the deploy: and it's one of our core value.

I have a hint for you here as well This is indeed used mostly for testing, but can be used in production system without any issue: https://github.com/yandex-qatools/postgresql-embedded and especially could help Alf easy of deployment. This is JVM embedded Postgres instance -:-)))) doesn't is sound cool? Of course it uses OS commands wrappers to manage it, but you have it under management of Alf.io service. So by default you simply start JVM and db is populated on the fly. I think for startes or novices with Alf platform it is best.

you only need a single store (postgresql ideally). And this is just ilusion actually. You can have single store in form of postgreSQL database, but having 2 schemas. One for EVENTs created from commands, second one for QUERY/READ only models. I will explain how this is helping. You can imagine event store as generic K,V store (simialr to EAV pattern used in the past in SQL databases) so data has no real structure, but are more like timeseries big hill. And from such hill, you can on the fly regenerate any kind of Query model which is exposed by sevice. Again I put it as something you might be interested to study in future.

Regarding transactions I don't have even gross estimates as it depends how many countries will be in scope, but this my project is more for non-profit, but rather some god work like ;-) But I am HPC geek, so interested in most optimal designs...

Thank you Syjer really for your kind answers and good luck with project. I like it a lot.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alfio-event / alf.io

[discussion] Migration from Spring to high performance API system #395