Priority-Based Data Processing for Golden Record

maximilianong commented 2 months ago

Description

The golden record service should prioritize data sets based on predefined criteria to ensure timely and fair processing. This feature will enable the system to handle data sets with varying urgency and importance, ensuring that critical data is processed promptly while maintaining fairness across different sharing members.

Key Scenarios

Portal Registration Priority:

Data sets submitted via the portal should be processed with higher priority to ensure the Business Partner Number (BPNL) is available quickly.

Fair Share of Processing:

When multiple sharing members submit data sets, the system should ensure that smaller batches of data are not delayed excessively by larger batches. This ensures fair processing time for all members.

SLA-Based Priority:

Data sets should be prioritized based on Service Level Agreements (SLAs). If a data set has been waiting for a significant amount of time and is nearing its SLA deadline, it should be automatically elevated in priority.

Implementation Details

Priority Definition:

Introduce a priority attribute for data sets.
Priority levels can be defined as High, Medium, and Low.

Priority Assignment:

Portal Registration: Assign High priority to data sets submitted via the portal.
Fair Share of Processing: Implement a round-robin or weighted round-robin algorithm to ensure fair processing across different sharing members.
SLA Consideration: Monitor the waiting time of data sets and adjust their priority based on SLA deadlines.

Processing Logic:

Modify the golden record service to check the priority attribute before processing data sets.
Implement a queue management system that dynamically adjusts the order of data sets based on their priority.

SLA Monitoring:

Introduce a monitoring mechanism to track the waiting time of each data set.
Automatically elevate the priority of data sets that are approaching their SLA deadline.

Additional information

Contribution will be done by @dilipdhankecha2530 Committer to support this will be @SujitMBRDI / @nicoprow / @maximilianong

maximilianong commented 2 weeks ago

This feature still needs a decision from the expert group - main question: Is it ok that the orchestrator knows about which data comes from which gate?

maximilianong commented 2 weeks ago

We had a brief discussion about this and the expert group suggested a good idea:
What if we assign the gate to a category rather than a specific company, and then define the priority based on that category? @dilipdhankecha2530 What do you think?

maximilianong commented 2 weeks ago

Another discussion took place today:

Categorization could probably help us with prioritization, but not with determining the confidence value. For example, we need to know in how many gates the same dataset appears.

There is also no general rule that the orchestrator is not allowed to temporarily know where the dataset comes from.

Of course, it should not be stored in a central database which pool data belongs to which company, but it is necessary for processing through the orchestrator.

stephanbcbauer commented 1 week ago

Some hints from Release Management (@ther3sa) and Tractus-X Project Lead (@stephanbcbauer)

Please add missing sections from feature template

dilipdhankecha2530 commented 1 week ago

To manage priority, here are two main options:

Manage Priority on the Gate Side
- All priorities and thresholds would be handled directly by the gate itself. This approach only requires adding properties to the configuration file.
- When a customer uploads data, we can set priorities on the records. Based on thresholds, we can manage higher and lower priorities. If a threshold is reached, records would automatically shift to a lower priority, making it easy to adjust just by changing the configuration.
- This setup eliminates the need for a central component to manage priority.
Manage Priority on the Orchestrator Side:
- If we store data in the orchestrator, originator information (the source of the data) needs to be saved in the database, making the orchestrator a central component for setting priority.
- One downside is that without full visibility into the gate deployment, we’d need to remember to register originators in the orchestrator. This setup would store originator data centrally, including priority details, in the database.

Please share your thoughts @maximilianong @nicoprow

eclipse-tractusx / sig-release