etf-validator / governance

ETF Steering Group and the Technical Committee documents
1 stars 2 forks source link

Introduce a parameter to control the maximum number of threads #48

Closed michellutz closed 2 years ago

michellutz commented 6 years ago

Background and Motivation

During some stress testing of the validator, related to https://github.com/etf-validator/governance/issues/14, we came upon some issues with concurrent requests. We observed that at a certain number of requests pooled, the system crashes and the test run is stoppped.

Using JMeter, we start a test run with seven tests (test suite Conformance Class 1-7), launching the same request iteratively during 2 minutes.

The exception as it appears in the server log is:

Task java.util.concurrent.FutureTask@4170493d
 rejected from java.util.concurrent.ThreadPoolExecutor@669f091a[Running, pool size = 12, active threads = 12, queued tasks = 12, completed tasks = 111]
    at de.interactive_instruments.etf.webapp.controller.TestRunController.initAndSubmit(TestRunController.java:262)
    at de.interactive_instruments.etf.webapp.controller.TestRunController.start(TestRunController.java:543)

Looking for the source of this issue, we found in the class etf-webapp/src/main/java/de/interactive_instruments/etf/webapp/controller/TestRunController.java, line 108, that the maximum number of threads available is set using the number of cores from the processor

etf-webapp/src/main/java/de/interactive_instruments/etf/webapp/controller/TestRunController.java

Changing manually this value allowed us to pool more requests, up to 100, without any performance issue.

Proposed change

Introduce a parameter to control the maximum number of threads to open up the possibility of elastic resource allocation.

Since threads in the queue can also block resources, also a parameter should be introduced to explicitly set the maximum size of the queue:

etf.testruns.max.threads
etf.testruns.max.queued

The default for etf.testruns.max.threadsshould be set to to the CPU cores. I would propose a default queue size of three times the CPU cores.

However there is a design flaw in the core: the TestTaskRegistry ctor does not allow to pass a parameter for the queue size. It's always identical to the number of max threads. So in your load test scenario you have increased the parallelism in order to increase the queue size. This might work for small datasets but might lead to issues with bigger ones.

In the next etf-core version 1.1.0 you can call this ctor and set the queue size. Change this line to use the updated library.

Alternatives

n/a

Funding

JRC will be ready to fund within its current development contract.

Additional information

n/a

michellutz commented 6 years ago

Based on etf-validator/etf-webapp#169. Split off from #14 as agreed in the 4th SG meeting on 2018-09-04.

jenriquesoriano commented 6 years ago

To do: add technical details

carlospzurita commented 5 years ago

This improvement has been developed and can be found on the Guadaltel fork, in the branch enhancement-sprint-october. Link to the commit https://github.com/guadaltel/etf-webapp/commit/d870e4f7d462814d5403eca659868fdab8347991

jonherrmann commented 5 years ago

Thanks @carlospzurita. I have some comments and the code changes do not quite match the proposed changes (parameters, using the fixed etf-core library 1.1.0) . Should I comment here, in your commit or wait for a merge request?

michellutz commented 5 years ago

@jonherrmann If you already have comments now, I would propose to share them here., before we make the pull request to the master.

jonherrmann commented 5 years ago

As I pointed out before:

However there is a design flaw in the core: the TestTaskRegistry ctor does not allow to pass a parameter for the queue size. It's always identical to the number of max threads. So in your load test scenario you have increased the parallelism in order to increase the queue size. This might work for small datasets but might lead to issues with bigger ones. In the next etf-core version 1.1.0 you can call this ctor and set the queue size. Change this line to use the updated library.

The changes do not make use of the updated library (see here, version 1.0.0 is still used). In addition to that aspect, this change sets a default of 30 test runs that can be run in parallel. This will cause problems -very bad runtime behavior and even crashes- in ETF instances running on smaller systems.

I think a proposal must and can not usually be implemented 1: 1. Then it would be good to know if/which problems occured and why, for example one parameter has been implemented, instead of the two proposed ones. That's a point we may need to highlight in a pull request (template).

carlospzurita commented 5 years ago

Thanks Jon. It is already fixed in https://github.com/guadaltel/etf-webapp/commit/622d2b195ca7966f3c4e1dfa38532ccf9e332bac . Also, we changed the property for parallel run to a much smaller number. However, if you have further comments on that, we will be glad to make the appropriate changes.

jonherrmann commented 5 years ago

I would propose using more 'descriptive' parameters. Especially when we introduce parallelisation in the ETS, it could be unobvious what these parameters control.

# Maximum number of tests which can run in parallel.
# Default: auto (the number of CPU cores of this machine)
etf.testruns.threads.max = auto

# Size of the task pool queue
# Default: auto (three times the parameter etf.testruns.threads.max)
etf.testruns.queued.max = auto

If autois set in the config, the configuration must dynamically replace the values.

Note: I previously proposed etf.testrungs.max.threads/queued which matches the style of this param. I think it is more understandable and a better style to put 'max' at the end in the new introduced parameters.

carlospzurita commented 5 years ago

As @jonherrmann suggested, we changed the parameters to be more descriptive, and we added the 'auto' value to set the default value. Also, we fixed an issue on the declaration of a variable. You can find the changes in https://github.com/guadaltel/etf-webapp/commit/974769c0a0376f9b681f3f4789cbf7fbb567bdbd

jonherrmann commented 5 years ago

The last commit https://github.com/guadaltel/etf-webapp/commit/974769c0a0376f9b681f3f4789cbf7fbb567bdbd can't work.

The variable names have only changed in the configuration template, the config controller still uses the old variable names. In addition, the whole logic for setting the variables to the value 'auto' is missing.

Please verify that you pushed the latest changes of the EtfConfigController.java file to the repository.

carlospzurita commented 5 years ago

You are right, it was an error. We pushed the changed version here https://github.com/guadaltel/etf-webapp/commit/b3984e47bdd1989567127406d1f78ae957f6c5d4

jonherrmann commented 2 years ago

Implemented in Version 2.1.0