ECMWFCode4Earth / challenges_2024

Discover the ECMWF Code for Earth 2024 challenges
46 stars 4 forks source link

Challenge 32 - Dynamic CDS Quality of Service (QoS) Rules based on real time system monitoring #10

Open RubenRT7 opened 4 months ago

RubenRT7 commented 4 months ago

Challenge 32 - Dynamic CDS Quality of Service (QoS) Rules based on real time system monitoring

Stream 3 - Software Development for Earth Sciences applications

Goal

At the core of the CDS Engine are the QoS rules. These rules allow the broker component to handle and prioritize the dispatch of requests submitted to CDS, control user traffic, give priorities to certain types of users and requests, manage system workload, etc. Conditions for the rules are dependent on system workload and traffic. The goal of this challenge is to create the required intelligence for the CDS to automatically configure these rules based on system status or expected projections.

Mentors and skills


Challenge description

Problem: In the current CDS management of QoS rules is done manually based mostly on monitoring observations and intuition about the workload of the system over time. Some of these rules have been then identified as suitable to be automatised based on the outputs of logs and monitoring tools which provide a real time view of the status of the system and keep record of past tendencies.

Data/System to be used: QoS rules are kept and administered in a text file within the system. Logs all the different components of the system are collected and centralised by the metrics component to be pre-processed for early warnings, real time indicators and then indexed into Splunk. Python is used for this post-processing. This component will be hosting the potential output of this challenge. Access to logs will be provided.

Solution: The proposed solution will focus on a small set of these QoS rules. Basically those that are directly in control of the user concurrency for accessing certain data repositories. The solution will create as outputs the values that these rules should be taken at any moment in time, with time frequency to de defined.

Ideas for implementation: All the information about the QoS rules and how these are administered will be provided by the mentors. They will also describe the content of each of the different logs. Proposed script is expected to run as cronjob.