NASA-IMPACT / pyQuARC

The pyQuARC tool reads and evaluates metadata records with a focus on the consistency and robustness of the metadata. pyQuARC flags opportunities to improve or add to contextual metadata information in order to help the user connect to relevant data products. pyQuARC also ensures that information common to both the data product and the file-level metadata are consistent and compatible. pyQuARC frees up human evaluators to make more sophisticated assessments such as whether an abstract accurately describes the data and provides the correct contextual information. The base pyQuARC package assesses descriptive metadata used to catalog Earth observation data products and files. As open source software, pyQuARC can be adapted and customized by data providers to allow for quality checks that evolve with their needs, including checking metadata not covered in base package.
Apache License 2.0
19 stars 0 forks source link

Implement Multithreading for Enhanced Performance in Custom Check Processing #284

Closed rajeshpandey2053 closed 2 months ago

rajeshpandey2053 commented 2 months ago

Description:

This pull request introduces a multithreading solution to enhance the performance of custom check processing in the codebase. The existing codebase comprises various rules applied to multiple fields. During execution, it became evident that parallel processing is necessary at two levels: the field level and the argument level.

At the field level, a check needs to be performed for multiple fields simultaneously. Meanwhile, at the argument level, a check traverses through multiple arguments within a single field. Therefore, nested multithreading is required to fully improve the overall performance of the project.

By leveraging multithreading at both levels, we aim to parallelize the execution of checks, thereby significantly improving performance and efficiency, especially in scenarios involving a large number of checks or resolving URLs.

Changes:

Testing:

Extensive testing has been conducted to ensure the correctness and performance of the multithreading solution. Integration tests have been performed to validate the code's functionality in various scenarios and edge cases.

Impact:

This change significantly improves the performance of custom check processing, especially in scenarios involving a large number of checks or resolving URLs. Improved the efficiency of the execution on average from ~100sto~10s