NRLMMD-GEOIPS / geoips

Main Geolocated Information Processing System code base with basic functionality enabled.
https://nrlmmd-geoips.github.io/geoips/
Other
14 stars 11 forks source link

Research and implement methods of parallelizing integration tests #511

Open jsolbrig opened 5 months ago

jsolbrig commented 5 months ago

Requested Update

Description

GeoIPS integration tests are slow because they require running GeoIPS end-to-end for each test. We will improve the integration test execution time by implementing a method for running the tests in parallel.

We will research methods of running GeoIPS jobs in parallel on a single system. This could be as simple as a Python script that calls GeoIPS multiple times via subprocess, multithreading, or multiprocessing. It could also take the form of shell scripts or employ other languages or off-the-shelf tools if appropriate.

Scope

This should be implemented entirely within the GeoIPS package. Any additional dependencies should be installable by including them in the dependencies defined in pyproject.toml. No assumptions should be made about the available hardware or availability of processing queues and other non-standard software.

Parallelization, for this issue, should be achieved by running multiple GeoIPS jobs in parallel on a single system. It should not be achieved by parallelizing the actual GeoIPS code or submitting GeoIPS jobs to a distributed or cluster processing queue.

Goal

Improve the speed of the integration tests by allowing them to run in parallel. This should result in a CLI option for the integration tests scripts that allows specifying the number of parallel jobs to be executed. Results from each job will need to be collected and reported to the top-level log file in a repeatable way where the tests appear in the same order, regardless of the order in which they actually execute.

When complete, what is new?