etf-validator / governance

ETF Steering Group and the Technical Committee documents
1 stars 2 forks source link

Load (at least) the most used Test Suites in memory #44

Closed michellutz closed 5 years ago

michellutz commented 6 years ago

Background and Motivation:

Performance optimisation is required to reduce startup time and validation time; reduction of startup time will simplify cloud deployment horizontal scaling, while reduction in validation time will be helpful while integrating ETF with INSPIRE Geoportal, or any other Metadata related workflow/pipeline.

One identified performance issue is that the Executable Test Suites to be run are loaded by the application for each validation request (and unloaded when done).

Proposed change

Pre-load all (or at least the most used) ETS resource files in memory memory and refresh them on a given time frame (if needed).

In the database, the object cache could be reactivated. This will increase memory consumption. However, since the test engines need a little time to initialize, it would be good to profile this before making any changes.

Alternatives

n/a

Funding

JRC will be ready to fund within its current development contract.

Additional information

n/a

michellutz commented 6 years ago

Split off from #14 as agreed in the 4th SG meeting on 2018-09-04.

carlospzurita commented 6 years ago

After some research, here is the initial approach we came up with to tackle this issue. We hope that this can serve as a matter of discussion. Of course, if there's any correction or contribution, it will be more than welcomed.

When executing the start operation in the client, a StartTestRunRequest object is instantiated on the server, then the following method is invoked over this object imagen Here the executableTestSuite are queried from the BaseX storage service contained in the etsDao object. This service is already initialized and loaded in memory, so one thing that can be done is, at the time of the query, request from an intermediate cache object if this ETS is loaded, and load it from there. If not, pass the query to BaseX and then save the result to this cache. This object could be an static class, containing a queue of limited size for storing ETS.

carlospzurita commented 5 years ago

As a more technical approach, in order to discuss on the TC, we propose the usage of a new utility class FixedSortedMap<TimestampEID, DTO>. This can inherit the SortedMap interface, but limit the size of the key set. The TimestampEID would be a Class containing the timestamp of retrieval, to use as a parameter for the Comparator, and the EID requested.

This map would hold as values the most recent used DTOs of ExecutableTestSuites, and when a new ETS is requested, the method would look for it on the Mapf it can't be found, it will retrieve it from the storage, get the DTO and then push it on the map cache.

The map cache could be initialized empty as an static attribute of the class, and populate it on runtime.

jonherrmann commented 5 years ago

As a more technical approach, in order to discuss on the TC, we propose the usage of a new utility class FixedSortedMap<TimestampEID, DTO>. This can inherit the SortedMap interface, but limit the size of the key set. The TimestampEID would be a Class containing the timestamp of retrieval, to use as a parameter for the Comparator, and the EID requested.

You mean a LRU-Cache? In general there is no need to implement a new one, we already have caffeine as dependency and also use the functionalities in etf: https://github.com/ben-manes/caffeine/wiki/Efficiency

We should not add this functionality to the top layer. I decided to put the streaming (+XML/JSON caching) there because it interacts with the web client, but an object cache should be in a lower layer.

There is a LRU cache in the database that has been deactivated since further testing is required. The commented out lines are here and here. I have just run the unit tests locally and they worked.

A Pull Request should not only contain the enabled lines. We need clarity that it really works without side effects (= either integration or system tests or a brief test protocol of what has been tested manually).

carlospzurita commented 5 years ago

As @jonherrmann we uncommented the referred lines on the etf-bsxds repository. However, when deactivating this line, we need to add a toString() after the getDtoId() call, to work with the declared types here. After adding this change and executing gradlew build, the following unit tests fail:

de.interactive_instruments.etf.dal.dao.basex.TestTaskResultDaoTest > test_2_0_add_and_get FAILED
    junit.framework.AssertionFailedError at TestTaskResultDaoTest.java:116

de.interactive_instruments.etf.dal.dao.basex.TestObjectDaoTest > test_5_0_update FAILED
    de.interactive_instruments.exceptions.StorageException at TestObjectDaoTest.java:361

de.interactive_instruments.etf.dal.dao.basex.TestObjectDaoTest > test_5_1_fail_on_update_replaced_item FAILED
    java.lang.Exception
        Caused by: java.lang.AssertionError at TestObjectDaoTest.java:347

de.interactive_instruments.etf.dal.dao.basex.TestObjectDaoTest > test_5_2_reset_after_update FAILED
    java.lang.AssertionError at TestObjectDaoTest.java:347

de.interactive_instruments.etf.dal.dao.basex.TestObjectDaoTest > test_6_0_pagination_history_items_filter FAILED
    java.lang.AssertionError at TestObjectDaoTest.java:347

A .jar file can still be generated using gradlew build, and we added this as a dependency to the etf-webapp, and run some performance tests. We created several test runs automatically, both in a cached deployment an another one without it. We used the metadata Test Suites, both 1.3 and 2.0, uploading a local XML with metadata. We observed an average improvement of 5-8 seconds with the cached version. Here you can see the memory and processor consumption on each case

METADATA 2.0 No cache no-cache_tg2 0_cpu-mem Cache cache_tg2 0_cpu-mem png

METADATA 1.3 No cache no-cache_tg1 3_cpu-mem Cache cache_tg1 3_cpu-mem

We dont appreciate significant changes on resource consumption.

jonherrmann commented 5 years ago

OK, then we need to know why the tests failed on your machine.

carlospzurita commented 5 years ago

After some debugging we have been able to pass all the tests after activating the cache. It seemed that there were some inconsistencies in the declaration of the cache between the DtoCache class here and the BsxPreparedDto reference here . We changed everything to use the Cache<EID,String>, but then we encountered a NullPointerException in the ContextIdResolver while running the test_1_2_unmarshalling on the TestObjectDaoTest class. We solved it initializing the lazyLookupIds set on the constructor.

jonherrmann commented 5 years ago

We solved it initializing the lazyLookupIds set on the constructor.

@carlospzurita did you repeat the performance tests and come to the same results?

carlospzurita commented 5 years ago

We have run the tests again, and the results are similars. We tested the improvement using Metadata 1.3 and 2.0, and WMS, as the previous test. We used 4 threads of parallel requests, in both deployments with and without cache activated. The response times differ in just 4-5 seconds. The Java garbage collector does seem to have increased it's activty. Again, this improvement has not yielded any relevant results. Maybe with a high concurrency scenario, the results would be more noteworthy.

Here you can see the resource consumptions for all the test runs

Metadata 1.3

Metadata 2.0

WMS

MarcoMinghini commented 5 years ago

Issue is closed, as decided in the SG Meeting 12.