OP - BPDM Orchestrator Component (Concept, Architecture, Implementation) (ext: CXAR-1020)

eclipse-tractusx / bpdm

bpdm

Apache License 2.0

6 stars 15 forks source link

OP - BPDM Orchestrator Component (Concept, Architecture, Implementation) (ext: CXAR-1020) #377

Closed Phil91 closed 11 months ago

Phil91 commented 1 year ago

+Benefit Hypothesis & Problem Statement+

As an operator, I need a way to process Business Partner data records based on different states and to be independent of the service provider used, so that I could replace it in a modular way or build the process steps in a more modular way.

The proposed way is by implementing an Orchestrator Layer that acts as a passive component and offers for each processing steps individual endpoints and sets according to this the current state for the data record for further processing. The processing of data will be in chunks.

(Chunks are necessary when thinking about checking for duplicates.)

As an Architect, I have the requirement, that the orchestrator component is flexible designed, so that new "states" for the data processing can easily added.

Following diagram illustrates the processing flow and interaction between the components.

sequenceDiagram Gate->>Orchestrator: Upload new or changed BP Orchestrator->>Orchestrator: Set State: new Orchestrator->>Orchestrator: Set External ID / Request ID: (Linkage) Orchestrator-->>Gate: Send Req ID loop Every Hour Simulator->>Orchestrator: Fetch for "new" Orchestrator-->>Simulator: BP with "new" State end Simulator->>Simulator: Duplicate Check Simulator->>Simulator: Natural Person Screening Simulator->>Simulator: Curation Simulator->>Simulator: Enrichment opt BPN does not exist Simulator->>Pool: Create new BP Pool->>Pool: Create BPN Pool-->>Simulator: BPN end Simulator-->>Orchestrator: Cleaned BP with BPN Orchestrator->>Orchestrator: Set State: CLEANED loop Every Hour Pool->>Orchestrator: Fetch "CLEANED" BP Orchestrator-->>Pool: BP with "CLEANED" State end Pool->>Pool: Update BP Pool->>Orchestrator: Commit Orchestrator->>Orchestrator: Set State: "FINISH" loop Every Hour Gate->>Orchestrator: Fetch for "FINISH" && "ReqID" Orchestrator-->>Gate: BP that match "FINISH" && "ReqID" Gate-->>Gate: Update BP end loop Every Hour Gate->>Pool: Fetch Changelog Pool-->>Gate: Send Changelog with BPNs Gate->>Gate: Filter for relevant BPNs where timestamp is newer than existing Gate->>Pool: Fetch changed BP Pool-->>Gate: Send changed BP Gate->>Gate: Update changelog for BPNs end

Components: BPDM_GR Sprints: N/A Fix Versions: N/A StoryPoints: N/A Attachments: CXAR-1020-mermaid-diagram-2023-07-19-144421.svg

nicoprow commented 1 year ago

How implement the orchestration in BPDM coming forward should be an open discussion. There are several ways to connect Gates and Pool together with an exchangable cleaning service. I will provide some proposals concentrating on advantages and disadvantages.

I'm open to feedback and counter-proposals so we can determine how to go forward with this feature.

nicoprow commented 1 year ago

Variant 1: Orchestrator Service with Pollable APIs

Heart of this variant is a new orchestration service that offers REST API endpoints for the other BPDM services to consume. The BPDM Gates as well as the Pool can connect to this orchestration service. In addition, the cleaning service (this will be a dummy implementation for now) will also connect with this specialized orchestration service. The so called orchestrator manages the state of the cleaning requests and handles offering the cleaning requests in its endpoint queues. The orchestrator is designed as being passive. The respective information for each service is accessed by authorized endpoint through polling mechanisms. The gates would also poll the Pool about udpates via the changelog.

sequenceDiagram
    SharingMember->>Gate: Upsert business partner A input
    Gate-->>Gate: Persist business partner A input
    Gate-->>SharingMember: Upserted business partner A input
    Gate->>Orchestrator: Request cleaning of business partner A input
    Orchestrator-->>Orchestrator: Create timestamped Cleaning Request B for business partner A input
    Orchestrator-->>Orchestrator: Enqueue in 'Generic Cleaning' Queue
    Orchestrator-->>Gate: Created cleaning request B
    Gate-->>Gate: Associate cleaning request ID with business partner A
    Gate-->>Gate: Upsert sharing state of business partner A
    loop Every X Minutes
        Gate->>Orchestrator: Ask cleaning request B state
        Orchestrator-->>Gate: State of cleaning request B
        Gate-->>Gate: Update sharing state
    end
    loop Every X Minutes
        CleaningServiceDummy->>Orchestrator: Fetch and reserve next cleaning requests in 'Generic Cleaning' queue
        Orchestrator-->>CleaningServiceDummy: Cleaning request B
        CleaningServiceDummy-->>CleaningServiceDummy: Perform dummy cleaning
        opt BPN does not exist
            CleaningServiceDummy->>CleaningServiceDummy: Add BPN request IDs
        end
        CleaningServiceDummy-->>Orchestrator: Send cleaning result
        Orchestrator-->>Orchestrator: Enqueue in 'BPN Processing' queue
        Orchestrator-->>CleaningServiceDummy: Commit
    end
    loop Every X Minutes
        Pool->>Orchestrator: Dequeue next cleaning requests in 'BPN Processing' queue
        Orchestrator-->>Pool: Cleaning request B cleaning result
        Pool-->>Pool: Upsert business partners with result
        Pool-->>Pool: Add BPNs from BPN request IDs
        Pool->>Orchestrator: Send cleaning result
        Orchestrator-->>Orchestrator: Commit
        Orchestrator-->Orchestrator: Enqueue in 'FINISHED' queue
    end
    loop Every X Minutes
        Gate->>Orchestrator: Dequeue cleaning requests by request ID in 'FINISHED' queue
        Orchestrator-->Gate: Cleaning request B cleaning result
        Gate-->>Gate: Resolve cleaning request B
        Gate-->>Gate: Update sharing state of business partner A
    end
    loop Every X Minutes
        Gate->>Pool: Fetch BPN changelog entries from last timestamp
        Pool-->>Gate: Entries from given timestamp
        opt Business Partner in Gate associated with BPN and newer timestamp
            Gate->>Gate: Start new cleaning process (see above)
        end
    end

Advantages:

Communication between well known BPDM services whose APIs can be standardised
Gates can't access requests and data from other Gates
Cleaning services only communicate with the orchestrator directly

Disadvantages:

Polling mechanisms introduce overhead
Polling mechanisms are not real-time
Polling can be replaced by an additional event queue signalling when new updates are available. But this introduces overhead to manage accessing orchestrator APIs and integrating an event queue at the same time

nicoprow commented 1 year ago

Variant 2 - Orchestration via Event Queue

Here the RabbitMq replaces a separate orchestration service. Cleaning request processing are realised by chaining events that are generated and consumed by the Gates, Pool and Cleaning service(s). Gates here are responsible for finding a unique request ID. A UUIDv4 together with a timestamp should mitigate any possible conflicts between Gates. Requests are pushed into the respective topics of the queue and so passed along from Gate, Cleaning Service, Pool and back to Gate. The Gates can stay informed about the sharing state by subscribing to the cleaning process step topics and updating their sharing state accordingly.

sequenceDiagram
SharingMember->>Gate: Upsert Business partner A
Gate-->>Gate: Persist business partner input A
Gate-->>Gate: Create timestamped Cleaning Request B for business partner A input
Gate-->>SharingMember: Created business partner A input
Gate->>RabbitMq: Send cleaning request B to topic 'New Cleaning Request'
RabbitMq-->>Gate: Commit
RabbitMq->>CleaningServiceDummy: Send cleaning request B
CleaningServiceDummy-->>RabbitMq: Commit
opt Input has no BPNs
    CleaningServiceDummy-->>CleaningServiceDummy: Create unique BPN request IDs
end
CleaningServiceDummy-->>CleaningServiceDummy: Create Cleaning Result for cleaning request B
CleaningServiceDummy->>RabbitMq: Send cleaning result for B to topic 'New Process Step Generic Cleaning Result'
RabbitMq-->>CleaningServiceDummy: Commit

RabbitMq->>Pool: Send cleaning result B
Pool-->>RabbitMq: Commit
Pool-->>Pool: Upsert LSA business partner based on BPNs/BPN-Request-IDs
Pool->>RabbitMq: Send  update result for B to topic 'New Pool Update Result'
RabbitMq-->>Pool: Commit

RabbitMq->>Gate: Send update result B
Gate-->>RabbitMq: Commit
Gate-->>Gate: Resolve cleaning request B
Gate-->>Gate: Persist business partner A output

Advantages:

No additional BPDM service needed which reduces implementation overhead
Real-time pushing of results leads to faster information propagation

Disadvantages:

Services in the chain need to 'play ball', they need to integrate into each other very well so well-defined payloads is important (Event Queue will just take any payload at first naturally)
Finding errors and their reasons is much more complicated in an event-based setup
Problematic: Cleaning requests in the queue are visible to every Gate. Even if business partners are anonymous in principle this could be a security issue. Separate queues can also be problematic as it makes visible in RabbitMq which Gate shared what business partner

EDIT:

Removed DummyBridge reference with CleaningServiceDummy

nicoprow commented 1 year ago

Variant 3: Event Queue with Lightweight RequestProvider

Just like Variant 2 heart of the orchestration is RabbitMq but in order to solve security issues, resulting from visible business partner data, we introduce request provider service. This request provider service is responsible for generating request IDs and caching the results. This service provides Gates with a way to receive cleaning results via a private request ID that is only known to the Gate which originally created the cleaning request. In this scenario the CleaningRequestProvider is responsible for creating a private and public request ID that it shares with the requesting Gate. The gate then published a cleaning request similar to Variant 2. At the end of the process though only the CleaningRequestProvider receives the actual cleaning result and broadcasts the notification of a finished cleaning request to the Gates. Only the Gate with the private request ID is then allowed to access the actual request result.

sequenceDiagram
SharingMember->>Gate: Upsert Business partner A
Gate-->>Gate: Persist business partner input A
Gate-->>SharingMember: Created business partner A input
Gate->>CleaningRequestProvider: Request new cleaning request ID
CleaningRequestProvider->>CleaningRequestProvider: Create timestamped Cleaning Request with public ID B and private ID C
CleaningRequestProvider-->>Gate: Cleaning request public ID B and private ID C
Gate-->>Gate: Associate business partner input A with cleaning request IDs
Gate->>RabbitMq: Send cleaning request B with input A to topic 'New Cleaning Request'
RabbitMq-->>Gate: Commit
RabbitMq->>CleaningServiceDummy: Send cleaning request B
CleaningServiceDummy-->>RabbitMq: Commit
opt Input has no BPNs
    CleaningServiceDummy-->>CleaningServiceDummy: Create unique BPN request IDs
end
CleaningServiceDummy-->>CleaningServiceDummy: Create Cleaning Result for cleaning request B
CleaningServiceDummy->>RabbitMq: Send cleaning result for B to topic 'New Process Step Generic Cleaning Result'
RabbitMq-->>CleaningServiceDummy: Commit

RabbitMq->>Pool: Send cleaning result B
Pool-->>RabbitMq: Commit
Pool-->>Pool: Upsert LSA business partner based on BPNs/BPN-Request-IDs
Pool->>RabbitMq: Send  update result for B to topic 'New Pool Update Result'
Pool->>RabbitMq: Send updated BPNs to topic 'New BPN Update'
RabbitMq-->>Pool: Commit

RabbitMq->>CleaningRequestProvider: Send update result B
CleaningRequestProvider-->>RabbitMq: Commit
CleaningRequestProvider-->>CleaningRequestProvider: Persist update result
CleaningRequestProvider->>RabbitMq: Send notification for cleaning request B 'Cleaning Request FINISHED'
RabbitMq-->>CleaningRequestProvider: Commit

RabbitMq->>Gate: Send notification cleaning request B finished
Gate-->>RabbitMq: Commit
Gate-->>CleaningRequestProvider: Request result for B via private key C
CleaningRequestProvider->>Gate: Result for cleaning request B
Gate-->>Gate: Resolve cleaning request B
Gate-->>Gate: Persist business partner A output

RabbitMq->>Gate: Send updated BPNs
Gate-->>RabbitMq: Commit
opt Business Partner in Gate associated with BPN and newer timestamp
     Gate->>Gate: Start new cleaning process (see above)
end

Advantages:

Solves security considerations of Variant 2
Other advantages of variant 2

Disadvantages:

More overhead due to the need of an additional service
More complexity due to the Gates having to integate to an event queue as well as an additonal service
Event-based disadvantages raised in Variant 2

EDIT:

Renamed DummyBridge to CleaningServiceDummy

gerlitmcoding commented 1 year ago

In my opinion, as I can see it from the description, Variant 3 combines the disadvantages of the other two variants. Because also here you would need again a polling mechanism for some parts. Or am I wrong?

In my first impression, I would recommend variant 1. Here you have the greatest security and sovereignty over the data and data exchange formats. You also have a good abstraction for the different cleaning states and can easily add new ones by just adding a new API Endpoint in the orchestrator service.

If polling every minute is not enough, you can think about adapting the APIs and the control flow from pull to push. But as far as I know, at the moment, there are no concrete requirements in that regard.

martinfkaeser commented 1 year ago

In V2, does it need both a CleaningServiceDummy and a DummyBridge, or is this redundant? And in V3, is the DummyBridge the same as the CleaningServiceDummy?

One thing about the duplicate check in the cleaning service: If it finds out the BP is a duplicate and already in the Pool it could and should directly set its BPN, not just BPN-RequestID, right?

Thinking about V1, couldn't the Orchestrator work without persisting any data and just keep the queue data in RAM? As the Gates should poll for the state of their cleaning requests they could detect if some BP goes missing and just add it again for cleaning. What is the worst that could happen? Maybe if BPs from the FINISHED queue go missing, so they already got a BPN assigned and were added to the Pool. But even then, after adding the BP again the CleaningService should detect it as a duplicate and assign the same BPN / BPN-RequestID and Gate should eventually get the updated BP. If not persisting anything in the Orchestrator is indeed feasible, this would mitigate security issues and also simplify implementation.

martinfkaeser commented 1 year ago

For V1 the mentioned disadvantages regarding polling don't seem critical. My main issue for me: Is there a legal problem to have this central Orchestrator component that stores (temporarily) all the different Gates' data?

About V3: On first sight this looks quite convoluted As Tim mentioned, doesn't the Gate need to poll the CleaningRequestProvider to get the cleaning result? So there is no real-time pushing of results. And if the CRP needs to persist the cleaning results this component gets quite complicated instead of the made-redundant Orchestrator. Maybe I'm overlooking something but I don't see any advantage of V3 vs V1.

About the event-based approaches (V2 and V3): I think this is not optimal to standardize as it's a different and additional protocol needed and there is no formal way that I know of to standardize what topics are allowed and which content schema is allowed for each topic.

General idea about persistence: If we really have to persist anything (see my previous comment) in some kind of Orchestrator component, maybe it makes sense to not use a relational DB, as this component doesn't care for the content but just acts as a generic state machine. Maybe just use a file or document-style approach.

About restarting the cleaning process in the end (all variants): I think it still makes sense to flag the new cleaning request as "validate" to prevent another Pool update, as considered in previous versions. If the cleaning service doesn't act completely reliable this might introduce a never-ending cleaning loop. Or maybe use a changed by SharingMember timestamp.

gerlitmcoding commented 1 year ago

@martinfkaeser how to store or not store the processed data is another discussion. You can configure in all Variants if intermediate data should be deleted or not. In this discussion we should really focus on the concept we want to use for orchestrating our process handling. And for this I'm also on your side. I do not see really many benefits that has a message bus instead of Variant 1. Instead, I see more downsides in Variant 2 and 3.

But I am open to discuss and hear arguments that speak for Variant 2 or 3 which we not had in mind right now 😃

pd0856 commented 1 year ago

We can create multiple queue from rabbitmq for each Gate component. Now Gate will subscribe the queue. Cleaning_Service will handle all messages and it contains the BPN with the queue identity so when we need to publish the message to sharing member's then we will publish it onto the rabbitmq.

@startuml
participant Sharing_Member as Sharing_Member
control GATE as GATE
database Gate_Database as Gate_Database
queue RabbitMQ as RabbitMQ
control Cleaning_Service as Cleaning_Service
database Cleaning_Database as Cleaning_Database
control Service_Provider as Service_Provider
control Pool as Pool
database Pool_Database as Pool_Database

RabbitMQ <- Cleaning_Service: Subscribe to the topic
RabbitMQ --> Cleaning_Service: Subscription has been done (Now when the message will publish on the topic \nIt will consume by the Cleaning_Service)

note over Sharing_Member, Pool_Database
Create a topic for each Gate component.
end note

GATE -> RabbitMQ: Subscribe to the topic that was created for Gate (Gate topic)
GATE <-- RabbitMQ: Subscription has been done (Now when the message will publish on the topic \nIt will consume by the Gate component)

Sharing_Member -> GATE:  Share Partners data

GATE -> GATE: Valid the requested data
GATE --> Sharing_Member: Return with a validation error message
GATE -> Gate_Database: save the partner's details with the input stage
GATE -> RabbitMQ: Publish data that need to clean on the topic (GR topic)

RabbitMQ -> Cleaning_Service: Push the data to the GR service
Cleaning_Service -> Cleaning_Service: Validation of the received data from the service bus
Cleaning_Service --> RabbitMQ: Publish the request with the validation failed
Cleaning_Service-> Cleaning_Database: If data is valid then save it in the database
Cleaning_Service -> Cleaning_Service: Prepare the request for Service_Provider
Cleaning_Service -> Cleaning_Database: Save the data to the table that was listened to by Service_Provider

note over Cleaning_Database
Need to create a User who has access to Service_Provider tables.
Share the user details with Service_Provider.
end note

Cleaning_Database <-- Service_Provider: Corn will be scheduled from Service_Provider for lookup of the input table
Service_Provider -> Service_Provider: Process the validation 
Service_Provider -> Service_Provider: The cleanup process has been initiated
Service_Provider -> Cleaning_Database: Save the data in the output table

Cleaning_Service-> Cleaning_Database: Corn will be scheduled to lookup the output table \nwhere the cleaned data stored
Cleaning_Service -> Pool: Initiate the BPN generation process
Pool-> Pool: Validate the BPN issuing request
Pool --> Cleaning_Service: Return with the error if validation will be failed
Cleaning_Service --> RabbitMQ: Return the requested data with the validation failed message
Pool -> Pool: Issue the BPN for partner
Pool -> Pool_Database: Save the partner's details to the database
Pool -> Cleaning_Service: Return with the data which contains the BPNL/S/A
Cleaning_Service -> Cleaning_Database: Save the data with BPNL/S/A

Cleaning_Service -> Cleaning_Database: Fetch the topic from the database used for publishing \nThe topic that will indicate the Gate component 
Cleaning_Service -> RabbitMQ: Publish the cleaned data on the multiple topics \nMultiple topic comes when multiple Gate has the same BP
GATE <- RabbitMQ: Push the data to the relevant gate 
GATE -> Gate_Database: Save the cleaned data on the output stage
GATE <- Sharing_Member: Now sharing member can access the cleaned data with BPNL/S/A
@enduml

martinfkaeser commented 1 year ago

@gerlitmcoding I think the question of persistence is relevant because our first concept was basically V1. But then during the PI planning concerns about storing all the different Gates' data in one central location lead us to consider a decentralized approach doing the work in the Gates. So this might be a knock-out criterion, or am I mistaken?

gerlitmcoding commented 1 year ago

@martinfkaeser No, it's no knock-out criteria. Also with Variant 1 you can "abstract" the storing mechanism via interfaces. So basically, you can use in-memory, Redis Cache or PostgreSQL in a concrete implementation. And after processing the records, in all solutions you would be able to delete the records if this is a requirement. Also with message queueing you would have the issue that records get stored in the queue for some time.

gerlitmcoding commented 1 year ago

@pd0856 can you explain please in a little bit more detail what the advantages of you solution vs. the outlined Variant 1 would be?

Also this seems for me a little bit unclear:

Now Gate will subscribe the queue. Cleaning_Service will handle all messages and it contains the BPN with the queue identity so when we need to publish the message to sharing member's then we will publish it onto the rabbitmq.

To publish messages back to a Gate you would need an extra topic which the Gate subscribes on. So basically you would need two topics per Gate, one for publishing, one for subscribing. And in the cleaningService you have to handle/store for each record where you have to publish the cleaned results? And how do you distribute the resulting updates to other Gates while preventing that they can see the whole uploaded data from the "Uploader-Gate"?

According to you PlantUML, you also still have polling inside the End-To-End Flow? So still no "real-time" communication?

And can you please also explain the goal/purpose and functions of the CleaningService in more detail?

rainer-exxcellent commented 1 year ago

Variant 1: Why does Orchestrator ist polled by the CleaningserviceDummy? In Variant 2 the RabbitMQ sends to and polls from the CleaningserviceDummy. I would prefer this for Variant 1.

How long is the cleaning process expected to last (Seconds, Minutes, Hours)? Are asynchronous REST services an option instead of polling?

pd0856 commented 1 year ago

@pd0856 can you explain please in a little bit more detail what the advantages of you solution vs. the outlined Variant 1 would be?

Also this seems for me a little bit unclear:

Now Gate will subscribe the queue. Cleaning_Service will handle all messages and it contains the BPN with the queue identity so when we need to publish the message to sharing member's then we will publish it onto the rabbitmq.

To publish messages back to a Gate you would need an extra topic which the Gate subscribes on. So basically you would need two topics per Gate, one for publishing, one for subscribing. And in the cleaningService you have to handle/store for each record where you have to publish the cleaned results? And how do you distribute the resulting updates to other Gates while preventing that they can see the whole uploaded data from the "Uploader-Gate"?

According to you PlantUML, you also still have polling inside the End-To-End Flow? So still no "real-time" communication?

And can you please also explain the goal/purpose and functions of the CleaningService in more detail?

Yeah sure, Let me discuss the overall the sequence diagram. And also regarding the Cleaning_Service, this service will have the central database where we will manage the Gate with it's relevant topic. Based on the request that was received by rabbitmq Cleaning_Service will identify where to publish the message. Here also i assume that the service provider will read the requests from the database. But that we can change. Or also i assume that service_provider is not support the webhook mechanism so we need to lookup the database which contains the processed data. Here also if BPN will be relevant to multiple Gate then Cleaning_service will identify that stuff as well and publish the same business partner on the multiple queues. Additionally, No other Gate will see if the partner is not a part of there component.

nicoprow commented 1 year ago

Variant 1: Why does Orchestrator ist polled by the CleaningserviceDummy? In Variant 2 the RabbitMQ sends to and polls from the CleaningserviceDummy. I would prefer this for Variant 1.

The CleaningServiceDummy is an exchangable service which in production can be replaced by a more elaborate cleaning service. Keeping the orchestrator passive allows us to keep the orchestrator the same while exchanging the cleaning service and letting it integrate into the orchestrator. The other way around is possible as well of course but would require a different strategy: The orchestrator would need exchangable components that integrate against the respective cleaning service. In my opinion the first option is simpler and allows exchanging the cleaning service by configuration of deployed services.

nicoprow commented 1 year ago

How long is the cleaning process expected to last (Seconds, Minutes, Hours)? Are asynchronous REST services an option instead of polling?

I think nobody can answer that at the moment as this is dependent on the actual cleaning service provider who performs the cleaning. In what way would asynchronous REST be different from polling? Polling is part of asynchronous calls, except if you want to use callbacks.

Rossisep commented 1 year ago

When I review all 3 variants I'm missing the following scenarios:

At the beginning all incoming upserted data records from each CX Member are screened by NPS to secure no NPS record is getting passed into the cleaning service
We have to enrich the process flow beyond the initial upload process. More important is the running mode where the external service provider pushes updates in a volume of thousands a day. The Orchestrator has to know for which CX Member persistence those updates are relevant and has to push those updates into the appropriate outbound persistence of the CX member. The CX Member system polls via the Gate API the logfile to identify changes.
The CX outbound persistence logfile has to contain all changes by attribute the Gate API provides to enable the CX Member system to identify each change.
The Orchesteration layer has to contain a memory function to store historical data matched to the architecture of the service provider. Thosedata are also important for the DQ Dashboard solution
The Orchastration layer has to be sensitive on start and end date content by defined attributes and support the predecessor and successor function.

cofinity-x-stephan commented 1 year ago

Thanks Nico for documenting the alternatives! Here are a few thoughts / comments on these:

While v2 attracts with its simplicity there are potential security concerns listed. What would be attack vectors here and would a similar attack in the other architecture proposals need higher access levels to succeed?
Does the current architecture already cover the concept of "Hardening Grade" for records? This requires a backend service provider to know about record ownership, which partially breaks through the requirement for complete anonymity of records.
Sharing of a changed address requires projections of Golden Records to customer-specific versions, since customers should only receive the fields of an address that they are actually also inserting into the process. How can we cover this in the current designs?
How do we want to process records returned from a provider? Does the bridge need to iterate through all gate instances to check to which gate to send a message to via the bus? Or do we put changed records more generically as a broadcast onto the bus and each gate instance decides for itself what to do with it, i.e. filtering if BPN is not relevant for the gate instance, projection to relevant fields in the gate etc. The latter seems to be much simpler and provides better decoupling, but it brings us back to the security concern from above?!
To simplify debugging/reporting a logging mechanism would be helpful. Is this already part of the current designs? If we would extensively use a bus, this infrastructure could also be used as a basis for logging with a logger component simply listening on the bus?

gerlitmcoding commented 1 year ago

While v2 attracts with its simplicity there are potential security concerns listed. What would be attack vectors here and would a similar attack in the other architecture proposals need higher access levels to succeed?

v2 seems just from the first view "simple". When it comes to implementation you are introducing a completely new system (message bus) and need new expertise in terminology, operating and managing, which takes a lot of time. v1 instead would build up on known technology components by "just" implementing a new Service Component. In addition, in v1 the shared business partners stay anonym via a "Ticket-ID" so no one except the Sharing Member itself knows about his business partner. In a message bus approach, you would have at least a 1:1 Relation between sharing member and topic where sharing member specific business partner could be tracked.

Does the current architecture already cover the concept of "Hardening Grade" for records? This requires a backend service provider to know about record ownership, which partially breaks through the requirement for complete anonymity of records.

Having an Identifier for Ownership in the data model is planned for the next release. Also for this, with v1 you can decide via your API Endpoints in the Orchestrator, who should have access to this field and how not. Via Topic approach, you would just push the complete business partner to the topic and a cleaning provider or someone else would see the complete message.

Sharing of a changed address requires projections of Golden Records to customer-specific versions, since customers should only receive the fields of an address that they are actually also inserting into the process. How can we cover this in the current designs?

All Gates periodically pull the Pool Changelog for changed BPNs. If there are changed BPNs that match with BPNs that are stored within the Gate, the Gate triggers a new cleaning request which starts the golden record process for this record. In the end it can fetch the result via the generated Ticket-ID.

How do we want to process records returned from a provider? Does the bridge need to iterate through all gate instances to check to which gate to send a message to via the bus? Or do we put changed records more generically as a broadcast onto the bus and each gate instance decides for itself what to do with it, i.e. filtering if BPN is not relevant for the gate instance, projection to relevant fields in the gate etc. The latter seems to be much simpler and provides better decoupling, but it brings us back to the security concern from above?!

See my previous comment. If we would do it like this via "broadcast", all other gates would also receive sharing member specific data which might cause a security issue. For example, all members might use the nameParts Field for different internal purposes and no other member should be able to recognize this.

To simplify debugging/reporting a logging mechanism would be helpful. Is this already part of the current designs? If we would extensively use a bus, this infrastructure could also be used as a basis for logging with a logger component simply listening on the bus?

For sure logging is an important topic. But logging can be realized with and without a message bus. So, logging and messaging are separate topics for me.

gerlitmcoding commented 11 months ago

I have created a pull request proposing an ADR on how to implement the orchestrator component. It should focus on KISS principle and simplicity. https://github.com/eclipse-tractusx/bpdm/pull/418

cofinity-x-stephan commented 11 months ago

When it comes to alternative one, i.e. no message bus, how do we share a changed record? I understand the gates are polling for updates in this approach and I assume that maybe a timestamp could be a parameter here, i.e. "give me back updates since timestamp". But since it was argued that only relevant records should be passed to a gate instance, how do we know in the orchestrator which updated records to return to a polling gate?

Rossisep commented 11 months ago

Isn't it true that we have to differentiate between different change situations:

GR engine fetches a record out of a CX Member Inbound Persistence and the GR process failed. CX Member gate polls and identifies this record which failed to receive a BPNID. Meaningful is the time stamp when the record was fetched and the time stamp when the GR engine updates the logfile of the CX Member Outbound Persistence. This happens only to the Outbound Persistence of the owner of this record.
A change occurs to a BP data record in the pool which already contains a valid BPNID. A significant change occurs to that the record will loose his Golden Record status at the date of this change - means the data record can only keep his Golden Record status and BPNID until the date the change occured. Also in this case we can use a time stamp to document this data, but we need a GR trigger field which has to get set to false at that date. Furthermore the GR process has to rerun to generate a new golden record with a new BPNID I estimated that in a full swing model around 7.000 changes will happen daily of which a fair amount of data records will loose their golden record status.

gerlitmcoding commented 11 months ago

When it comes to alternative one, i.e. no message bus, how do we share a changed record? I understand the gates are polling for updates in this approach and I assume that maybe a timestamp could be a parameter here, i.e. "give me back updates since timestamp". But since it was argued that only relevant records should be passed to a gate instance, how do we know in the orchestrator which updated records to return to a polling gate?

@cofinity-x-stephan based on the changelog in the pool on which the gates are polling, they check if they have an BPN which was updated in the pool. If so, the according gate will trigger a new cleaning request for his business partner with that BPN and the cleaning process starts for the gate. So not the whole Business Parter Object gets passed through all gates.

cofinity-x-stephan commented 11 months ago

thanks tim! but this means that for an address that is used in 20 gates we also have 20 runs of the cleaning process even though only a single run actually is needed? or would the first run in your concept be different from the consecutive runs? we need to be cautious here since the backend service may also include human data curators that have to be paid, thus we need to minimize calls to that service to provide a cost effective solution.

gerlitmcoding commented 11 months ago

Since we do not know about the concrete cleaning service (provider) we are open for discussions and how you would propose the end-to-end flow. Generally, speaking we have to take a decision for overall architecture and target goal which in my opinion should be the API based service orchestrator approach because of its simplicity and flexibility. If we come up at some point that a messaging system is really mandatory, of course we can introduce one. But for now, focusing on what is really needed and how we can deliver fast, the API based approach matches best!

In regard to your concrete question. Yes, it was intended like you describe, because it would be a solution that works out of the box for a first minimal viable product. For sure we can also think about introducing an additional state in the orchestrator for business partner records that should not be "cleaned", because they already are, but should just be updated with latest changes from Golden Record Pool. The Problem here comes with the details. Because Golden Record Pool data model differentiates from Gate Model and somehow a mapping between both have to be done when we say that the business partner record should not again run through the whole cleaning process. Especially thinking about that every sharing member has different semantics of gate specific data model fields, i.e. the name parts field is used differently through all sharing members.

nicoprow commented 11 months ago

I have prepared a cleaned and slimmed down variant 1 overview which is intended to act as a base for more issues detailing work packages needed to achieve what we need. This diagram still expects that each Gate will start its own cleaning process after it identifies a BPN update in the Pool. But optimizations here are easy to make in this regard. With a few tweaks the Gates needing only an update could trigger a slimmed down version of the cleaning process for example.


sequenceDiagram
    SharingMember->>Gate: (1) Upsert business partner A input
    Gate-->>Gate: (2) Persist business partner A input
    Gate->>Orchestrator: (3) Request cleaning of business partner A input
    Orchestrator-->>Orchestrator: (4) Create timestamped Cleaning Request B for business partner A input
    Orchestrator-->>Orchestrator: (4) Update Cleaning Request B State: QUEUED_GENERIC_CLEANING
    Orchestrator-->>Gate: Created cleaning request B
    Gate-->>Gate: (5) Associate cleaning request ID with business partner A
    Gate-->>Gate: (5) Upsert sharing state of business partner A
    Gate-->>SharingMember: Upserted business partner A input
    loop Poll Generic Cleaning
        CleaningServiceDummy->>Orchestrator: (6) Fetch and reserve next cleaning requests in 'Generic Cleaning' queue
        Orchestrator-->>Orchestrator: (4) Update state of cleaning request B: RESERVED_GENERIC_CLEANING
        Orchestrator-->>CleaningServiceDummy: Cleaning request B with business partner input
        CleaningServiceDummy->>CleaningServiceDummy: (7) Create dummy cleaning results
        opt BPN does not exist
            CleaningServiceDummy->>CleaningServiceDummy: (7) Add BPN request IDs into results
        end
        CleaningServiceDummy->>Orchestrator: (8) Send cleaning result for request B
        Orchestrator-->>Orchestrator: (4) Update Cleaning Result of request B
        Orchestrator-->>Orchestrator: (4) Update state of cleaning request B: QUEUED_BPN_PROCESSING
        Orchestrator-->>CleaningServiceDummy: Accept
    end
    loop Poll BPN Processing
        Pool->>Orchestrator: (9) Fetch and reserve next cleaning requests in 'BPN Processing' queue
        Orchestrator-->>Orchestrator: (4) Update state of cleaning request B: RESERVED_BPN_PROCESSING
        Orchestrator-->>Pool: Cleaning request B cleaning result
        opt L/S/A result has changed
            Pool-->>Pool: (10) Upsert L/S/A business partners with result
        end
        Pool-->>Pool: (11) Add BPNs from BPN request IDs to cleaning result
        Pool->>Orchestrator: (12) Send BPN cleaning result for request B
        Orchestrator-->>Pool: Accept
        Orchestrator-->>Orchestrator: Update state of cleaning request B: FINISHED
    end
    loop Poll for finished Cleaning Requests
        Gate->>Orchestrator: (13) Fetch cleaning request cleaning result by request ID
        Orchestrator-->Gate: Cleaning request B cleaning result
        Gate-->>Gate: (5) Resolve cleaning request B
        Gate-->>Gate: (5) Update sharing state of business partner A
    end

I also created a simple dependency chart showing which work packages are dependent on each other in what way:

flowchart LR
    model[Generic Business Partner DTO Interface]

    subgraph GateLSA[Gate LSA]
        lsaMapLegalEntity["(15) Logic: Map legal Entities to generic model"]
        lsaConnectLegalEntity["(16) Logic: Connect legal entity endpoints to generic service logic"]
        lsaMigrateLegalEntity["(17) Db-Script: Migrate existing legal entities to generic business partners"]
        lsaMapSite["(18) Logic: Map sites to generic model"]
        lsaConnectSite["(19) Logic: Connect site endpoints to generic service logic"]
        lsaMigrateSite["(20) Db-Script: Migrate existing sites to generic business partners"]
        lsaMapAddress["(21) Logic: Map addresses to generic model"]
        lsaConnectAddress["(22) Logic: Connect address endpoints to generic service logic"]
        lsaMigrateAddress["(23) Db-Script: Migrate existing addresses to generic business partners"]
    end

    lsaMapLegalEntity-->lsaConnectLegalEntity
    lsaConnectLegalEntity-->lsaMigrateLegalEntity
    lsaMapSite-->lsaConnectSite
    lsaConnectSite-->lsaMigrateSite
    lsaMapAddress-->lsaConnectAddress
    lsaConnectAddress-->lsaMigrateAddress

    subgraph GateOrch[Gate Orchestration]
         gateGenericInputEndpoints["(1) Input Endpoints for Generic Business Partner"]
         gateGenericPersistence["(2) Persist Generic Business Partners"]
         gateRequestCleaning["(5) Request Cleaning for Own Update"]
         gateResolveRequest["(13) Resolve finished Cleaning Results"]
         gateCheckBpnUpdate["(14) Request Cleaning for BPN Update"]
    end

    gateGenericInputEndpoints-->gateGenericPersistence
    gateGenericPersistence-->gateRequestCleaning
    gateRequestCleaning-->gateResolveRequest
    gateRequestCleaning-->gateCheckBpnUpdate

    subgraph Orchestrator[Orchestrator]
        orchBase[Setup module]
        orchRequestCleaningEndpoint["(3) Request Business Partner Cleaning Endpoint"]
        orchCleaningQueueEndpoint["(6) Reserve Cleaning Requests for Cleaning Endpoint"]
        orchCleaningResolveEndpoint["(8) Resolve Cleaning Request for Cleaning Endpoint"]
        orchBpnQueueEndpoints["(9) Reserve Cleaning Requests for BPN Processing Endpoint"]
        orchBpnResolveEndpoint["(12)  Resolve Cleaning Request for BPN Processing Endpoint"]
        orchFinishedEndpoint["(13) Fetch Finished Cleaning Request Endpoint"]
        orchStateMachine["(4) Manage Cleaning Request State and Data"]
    end

    orchBase-->orchRequestCleaningEndpoint
    orchBase-->orchCleaningQueueEndpoint
    orchBase-->orchCleaningResolveEndpoint
    orchBase-->orchBpnQueueEndpoints
    orchBase-->orchBpnResolveEndpoint
    orchBase-->orchFinishedEndpoint
    orchRequestCleaningEndpoint-->orchStateMachine
    orchCleaningQueueEndpoint-->orchStateMachine
    orchCleaningResolveEndpoint-->orchStateMachine
    orchBpnQueueEndpoints-->orchStateMachine
    orchFinishedEndpoint-->orchStateMachine

    subgraph CleaningDummy[Cleaning Service Dummy]
        dummyBase[Setup module]
        dummyServiceLogic["(7) Create Dummy Cleaning Result"]
    end

    dummyBase-->dummyServiceLogic

    subgraph Pool[Pool]
        poolUpsert["(10) Upsert Business Partners from Cleaning Result"]
        poolResult["(11) Send BPN Processing Result"]
    end

    poolUpsert-->poolResult

    orchRequestCleaningEndpoint-->gateRequestCleaning
    orchCleaningQueueEndpoint-->dummyServiceLogic
    orchBpnQueueEndpoints-->Pool
    orchBpnResolveEndpoint-->Pool
    orchFinishedEndpoint-->gateResolveRequest

    gateGenericPersistence-->lsaConnectLegalEntity
    gateGenericPersistence-->lsaConnectSite
    gateGenericPersistence-->lsaConnectAddress

    model-->GateLSA
    model-->GateOrch
    model-->Orchestrator
    model-->CleaningDummy
    model-->Pool

I created also issues for work packages which give more details: (1) #384 (2) #428 (3) #425 #429 (4) #427 (5) #429 (6) #388 (7) #433 (8) #392 (9) #434 (10, 11) #432 (12) #435 (13) #426 #430 (14) #431 (15) #441 (17) (18) #441 (19) (20) (21) #441 (22) (23)

Please not that the linked issues need refinment still and will undergo change in the next days.

EDIT:

Update of the sequence diagram to showcase business partner updates when a BPN has changed. Will trigger a different workflow than the conventional cleaning process.

EDIT 2: Removed the BPN update flow as its now depicted separetely here: https://github.com/eclipse-tractusx/bpdm/issues/377#issuecomment-1715421575

nicoprow commented 11 months ago

This is a squence diagram of the orchestration flow depicting the use case of updating a Gate business partner entry based on a BPN update from the pool. In this flow the Gate business partner does not have any new information and just needs the newest state from the Pool incorporated. The service provider is responsible for incorporating the Pool state to the Gate business partner. No extensive cleaning is needed in this case.

sequenceDiagram
    loop Poll Pool BPN Changes
        Gate->>Pool: Fetch BPN changelog entries from last timestamp
        Pool-->>Gate: Entries from given timestamp
        opt Business Partner in Gate associated with BPN C and newer timestamp (14)
            Gate->>Orchestrator: Request update for business partner A from Pool
            Orchestrator-->>Orchestrator: Create timestamped Cleaning Request D for business partner A input
            Orchestrator-->>Orchestrator: Update Cleaning Request B State: QUEUED_UPDATE_LSA
            Orchestrator-->>Gate: Created cleaning request D
            Gate-->>Gate: Associate cleaning request D with business partner A
            Gate-->>Gate: Upsert sharing state of business partner A
        end
    end
    loop Pool L/S/A Update queue
        CleaningServiceDummy->>Orchestrator: Fetch and reserve next cleaning requests in 'Update L/S/A' queue
        Orchestrator-->>Orchestrator: Update state of cleaning request B: RESERVED_UPDATE_LSA
        Orchestrator-->>CleaningServiceDummy: Cleaning request D with business partner input A
        CleaningServiceDummy->>Pool: Request L/S/A business partner with BPN C
        Pool-->>Pool: Fetch L/S/A business partner C form database
        Pool-->>CleaningServiceDummy: Return L/S/A business partner C
        CleaningServiceDummy-->>CleaningServiceDummy: Create dummy update result
        CleaningServiceDummy->>Orchestrator: Send update result for request D
        Orchestrator-->>Orchestrator: Update Update Result of request D
        Orchestrator-->>Orchestrator: Update state of cleaning request D: FINISHED
        Orchestrator-->>CleaningServiceDummy: Accept
    end
    loop Poll for finished Cleaning Requests
        Gate->>Orchestrator: (13) Fetch cleaning request cleaning result by request ID
        Orchestrator-->Gate: Cleaning request D cleaning result
        Gate-->>Gate: (5) Resolve cleaning request D
        Gate-->>Gate: (5) Update sharing state of business partner A
    end

nicoprow commented 11 months ago

Since the concept of the orchestration layer has evolved alot in our discussion I have opened a new issue #455 in which the newest state of our concept is condensed. This helps us keep the concept concise and readable. Please post further comments on this to the new issue #455.