etf-validator / governance

ETF Steering Group and the Technical Committee documents
1 stars 2 forks source link

Refresh BaseX DB at given intervals, but not during startups #45

Closed michellutz closed 4 years ago

michellutz commented 6 years ago

Background and Motivation:

Performance optimisation is required to reduce startup time and validation time; reduction of startup time will simplify cloud deployment horizontal scaling, while reduction in validation time will be helpful while integrating ETF with INSPIRE Geoportal, or any other Metadata related workflow/pipeline.

One identified performance issue is that BaseX contains a cache of all validation results, and of all tests; the DB is re-initialised each time Tomact is started/restarted; related data is persisted on file system under /home/tomcat/.etf/ .

Proposed change

Refresh BaseX DB at given intervals, but not during startups.

Alternatives

A parameter could be introduced to deactivate some time-consuming consistency checks during the startup.

Funding

JRC will be ready to fund within its current development contract.

Additional information

n/a

michellutz commented 6 years ago

Split off from #14 as agreed in the 4th SG meeting on 2018-09-04.

carlospzurita commented 6 years ago

After conducting some research, we think that this change would need major refactoring of the code. Many controllers on the webapp code (TestDriverController, TestResultController...) rely on an initialized DataStorageService, that contains a BaseX instance using the class BsxDataStorage. The application can't deploy this controllers, that are necessary to run the services, without starting this data storage. Also, we think that this operation makes more sense during startup, better than launching this task at the same time that the users are using the ETF.

Looking at the class BsxDataStorage, on the etf-bsxds module, we can't observe any possible refactoring to improve the startup time, all the tasks executed on initialization seems necessary for us.

In our experience, the most time-consuming task during deployment is the download of ETS files from GitHub. We may run some more tests to assess startup times thoroughly.

carlospzurita commented 5 years ago

Due to some gross startup tests, BaseX DB startup time has not revealed to be an excessive time consumption task in the startup considering it in absolute terms.

A mean startup for the ETF validator is 60 seconds which we can estimate that 30/60 is consumed by the BaseX DB initialisation. Event if the BaseX initialization is roughly a 50% of the total time, it is still a very low amount of time.

Thus, considering that:

michellutz commented 4 years ago

Closed as agreed in the SG meeting on 2020-01-21