eiffel-community / eiffel-intelligence

Eiffel Intelligence is a real time data aggregation and analysis solution for Eiffel events.
Apache License 2.0
11 stars 72 forks source link

Change Event and Aggreagations threads to use only one threadpool #475

Closed tobiasake closed 3 years ago

tobiasake commented 4 years ago

Applicable Issues

Today EI consume a lot of CPU and Memory reasources due to EI can consume a lot events that causes spawning a lot of threads that causes high JVM CPU and Memory load. Since each thread has also has a MongoDB connector, it will cause high load on MongoDB as well.

Description of the Change

In some deployments it has been reported that EI consumes to much resource(CPU, memory) and this due to that we use unlimited number of threads for matching aggregations with subscriptions jmePaths and in same time a lot of MongoDB queries is made from these threads, which causes extra load on MongoDb.

With this change only one and same thread that consumes the event from MessageBus is used through whole EI.

By doing this, we avoid using too much memory and too much CPU. If the system that is used can handle more events and aggregations, then increase the threadpool size configuration in application.properties file.

This includes the fix for duplication of AggreagationsTtl fields in Infromation RestApi. Old and faulty "/information" entrypoint return value: "objectHandler" : { "aggregationsCollectionName" : "aggregations", "databaseName" : "eiffel_intelligence", "ttl" : 0 "aggregationsTtl" : "" }, With this PR change, which is how it should be: "objectHandler" : { "aggregationsCollectionName" : "aggregations", "databaseName" : "eiffel_intelligence", "aggregationsTtl" : "" },

Fixed MongoDbHandler so MongoDb connection is restored in case of MongoDB connection goes down and comes up again, then connection is automatically restored.

Alternate Designs

Benefits

Reduce load on JVM and MongoDb and minimize overload on MongoDb.

Possible Drawbacks

EI might consume events in a slower pace.

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

(b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

(d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

Signed-off-by: @tobiasake

m-linner-ericsson commented 4 years ago

Why just merge https://github.com/eiffel-community/eiffel-intelligence/pull/465 into master?

tobiasake commented 4 years ago

Well its been several discussions and meetings about that through 3-4 months since it was forced merge without any tests nor verifications. And meetings that I have not been part of, it seems. And other issues has been brought due to that change. I wrote one of the issue in a ticket here: https://github.com/eiffel-community/eiffel-intelligence/issues/469

This is another approach to solve it where I removed this unlimited threads for the last aggregations and subscription matching step, where we use same thread from from consuming event from messagebus until aggregations and subscritpioning match is finished. With this change we have only one threadpool to control also. And events/aggregations is more persistent, since we don't queue threads(in threadpools queues etc.) in memory, instead these events/aggregations is waiting in Messagebus queue until EI has resources to process them. I will not go into more details than this, since this has been discussed long time. This is my suggestion on the solution to make it more persistent and simpler controlling the resources/load and simpler architecture.

tobiasake commented 4 years ago

I still don't get why not to use #465 it has been out there and tried. We don't have any automatic load tests but it has been tested manually.

EI Still crash, was on a meeting 2 weeks ago and it still crash. They have got it working okay by setting the ttl on aggregated object to 10 min, but customer pipeline lasted longer than 10 min so some event was never aggregated and no Subscription was triggered.

I implemented(Or had looked att the code and was going to do the implementation) this already in June, when you and Emily created the other solution. We were going to test both this the other solution in. I was on the way to setup a test environment in Kubernetes with both solution in June, but in the middle of that the the other solution was forced merge without any tests and so on. And now its been several issues since then. And customer is planning to upgrade to EI 3.0, than I thought we could try another approach to decrease the load and is simpler to control by have only one threadpool and also have more persistence on event that stay in MessageBus Queue until EI can process them.

m-linner-ericsson commented 4 years ago

Do we still have problems with https://github.com/eiffel-community/eiffel-intelligence/pull/465. Please note that this one is on 2.x branch?

tobiasake commented 4 years ago

Yepp, I know thats on 2.x. But we still have the origin solution with unlimited thread on master. Thats why I suggest another approach/solution on master branch.

tobiasake commented 3 years ago

To all of you that want to have more threads executing in parallel as the solution in EI 2.x that comes from the 2.0-maintenance branch which causing high load problem on JVM and MongoDB in some environments where its 100000+ number of events, can we wait and see how this solution works. If it works good, then we maybe can stay on this solution with only one thread pool. If it not works well, then you others that prefer the EI 2.0 Mainatenace branch solution with two thread pools make PR with change for EI 3.0.

m-linner-ericsson commented 3 years ago

In the solution on 2.0 we can set the thread pool with configuration here it is hardcoded.

tobiasake commented 3 years ago

No, it is configurable with these properties, provided as Java properties to command line or set the properties in a application.properties file in same folder as Java executes from or in a config/ folder. This is the properties that configure the thread pool: threads.core.pool.size threads.queue.capacity threads.queue.capacity threads.max.pool.size threads.max.pool.size

In 2.0 maintenance branch releases the second thread pool is hard coded.