Closed michellutz closed 2 years ago
Based on etf-validator/etf-webapp#169. Split off from #14 as agreed in the 4th SG meeting on 2018-09-04.
To do: add technical details
This improvement has been developed and can be found on the Guadaltel fork, in the branch enhancement-sprint-october. Link to the commit https://github.com/guadaltel/etf-webapp/commit/d870e4f7d462814d5403eca659868fdab8347991
Thanks @carlospzurita. I have some comments and the code changes do not quite match the proposed changes (parameters, using the fixed etf-core library 1.1.0) . Should I comment here, in your commit or wait for a merge request?
@jonherrmann If you already have comments now, I would propose to share them here., before we make the pull request to the master.
As I pointed out before:
However there is a design flaw in the core: the TestTaskRegistry ctor does not allow to pass a parameter for the queue size. It's always identical to the number of max threads. So in your load test scenario you have increased the parallelism in order to increase the queue size. This might work for small datasets but might lead to issues with bigger ones. In the next etf-core version 1.1.0 you can call this ctor and set the queue size. Change this line to use the updated library.
The changes do not make use of the updated library (see here, version 1.0.0 is still used). In addition to that aspect, this change sets a default of 30 test runs that can be run in parallel. This will cause problems -very bad runtime behavior and even crashes- in ETF instances running on smaller systems.
I think a proposal must and can not usually be implemented 1: 1. Then it would be good to know if/which problems occured and why, for example one parameter has been implemented, instead of the two proposed ones. That's a point we may need to highlight in a pull request (template).
Thanks Jon. It is already fixed in https://github.com/guadaltel/etf-webapp/commit/622d2b195ca7966f3c4e1dfa38532ccf9e332bac . Also, we changed the property for parallel run to a much smaller number. However, if you have further comments on that, we will be glad to make the appropriate changes.
I would propose using more 'descriptive' parameters. Especially when we introduce parallelisation in the ETS, it could be unobvious what these parameters control.
# Maximum number of tests which can run in parallel.
# Default: auto (the number of CPU cores of this machine)
etf.testruns.threads.max = auto
# Size of the task pool queue
# Default: auto (three times the parameter etf.testruns.threads.max)
etf.testruns.queued.max = auto
If auto
is set in the config, the configuration must dynamically replace the values.
Note: I previously proposed etf.testrungs.max.threads/queued
which matches the style of this param. I think it is more understandable and a better style to put 'max' at the end in the new introduced parameters.
As @jonherrmann suggested, we changed the parameters to be more descriptive, and we added the 'auto' value to set the default value. Also, we fixed an issue on the declaration of a variable. You can find the changes in https://github.com/guadaltel/etf-webapp/commit/974769c0a0376f9b681f3f4789cbf7fbb567bdbd
The last commit https://github.com/guadaltel/etf-webapp/commit/974769c0a0376f9b681f3f4789cbf7fbb567bdbd can't work.
The variable names have only changed in the configuration template, the config controller still uses the old variable names. In addition, the whole logic for setting the variables to the value 'auto' is missing.
Please verify that you pushed the latest changes of the EtfConfigController.java file to the repository.
You are right, it was an error. We pushed the changed version here https://github.com/guadaltel/etf-webapp/commit/b3984e47bdd1989567127406d1f78ae957f6c5d4
Implemented in Version 2.1.0
Background and Motivation
During some stress testing of the validator, related to https://github.com/etf-validator/governance/issues/14, we came upon some issues with concurrent requests. We observed that at a certain number of requests pooled, the system crashes and the test run is stoppped.
Using JMeter, we start a test run with seven tests (test suite Conformance Class 1-7), launching the same request iteratively during 2 minutes.
The exception as it appears in the server log is:
Looking for the source of this issue, we found in the class etf-webapp/src/main/java/de/interactive_instruments/etf/webapp/controller/TestRunController.java, line 108, that the maximum number of threads available is set using the number of cores from the processor
etf-webapp/src/main/java/de/interactive_instruments/etf/webapp/controller/TestRunController.java
Changing manually this value allowed us to pool more requests, up to 100, without any performance issue.
Proposed change
Introduce a parameter to control the maximum number of threads to open up the possibility of elastic resource allocation.
Since threads in the queue can also block resources, also a parameter should be introduced to explicitly set the maximum size of the queue:
The default for
etf.testruns.max.threads
should be set to to the CPU cores. I would propose a default queue size of three times the CPU cores.However there is a design flaw in the core: the TestTaskRegistry ctor does not allow to pass a parameter for the queue size. It's always identical to the number of max threads. So in your load test scenario you have increased the parallelism in order to increase the queue size. This might work for small datasets but might lead to issues with bigger ones.
In the next etf-core version 1.1.0 you can call this ctor and set the queue size. Change this line to use the updated library.
Alternatives
n/a
Funding
JRC will be ready to fund within its current development contract.
Additional information
n/a