Closed Phil91 closed 11 months ago
How implement the orchestration in BPDM coming forward should be an open discussion. There are several ways to connect Gates and Pool together with an exchangable cleaning service. I will provide some proposals concentrating on advantages and disadvantages.
I'm open to feedback and counter-proposals so we can determine how to go forward with this feature.
Variant 1: Orchestrator Service with Pollable APIs
Heart of this variant is a new orchestration service that offers REST API endpoints for the other BPDM services to consume. The BPDM Gates as well as the Pool can connect to this orchestration service. In addition, the cleaning service (this will be a dummy implementation for now) will also connect with this specialized orchestration service. The so called orchestrator manages the state of the cleaning requests and handles offering the cleaning requests in its endpoint queues. The orchestrator is designed as being passive. The respective information for each service is accessed by authorized endpoint through polling mechanisms. The gates would also poll the Pool about udpates via the changelog.
sequenceDiagram
SharingMember->>Gate: Upsert business partner A input
Gate-->>Gate: Persist business partner A input
Gate-->>SharingMember: Upserted business partner A input
Gate->>Orchestrator: Request cleaning of business partner A input
Orchestrator-->>Orchestrator: Create timestamped Cleaning Request B for business partner A input
Orchestrator-->>Orchestrator: Enqueue in 'Generic Cleaning' Queue
Orchestrator-->>Gate: Created cleaning request B
Gate-->>Gate: Associate cleaning request ID with business partner A
Gate-->>Gate: Upsert sharing state of business partner A
loop Every X Minutes
Gate->>Orchestrator: Ask cleaning request B state
Orchestrator-->>Gate: State of cleaning request B
Gate-->>Gate: Update sharing state
end
loop Every X Minutes
CleaningServiceDummy->>Orchestrator: Fetch and reserve next cleaning requests in 'Generic Cleaning' queue
Orchestrator-->>CleaningServiceDummy: Cleaning request B
CleaningServiceDummy-->>CleaningServiceDummy: Perform dummy cleaning
opt BPN does not exist
CleaningServiceDummy->>CleaningServiceDummy: Add BPN request IDs
end
CleaningServiceDummy-->>Orchestrator: Send cleaning result
Orchestrator-->>Orchestrator: Enqueue in 'BPN Processing' queue
Orchestrator-->>CleaningServiceDummy: Commit
end
loop Every X Minutes
Pool->>Orchestrator: Dequeue next cleaning requests in 'BPN Processing' queue
Orchestrator-->>Pool: Cleaning request B cleaning result
Pool-->>Pool: Upsert business partners with result
Pool-->>Pool: Add BPNs from BPN request IDs
Pool->>Orchestrator: Send cleaning result
Orchestrator-->>Orchestrator: Commit
Orchestrator-->Orchestrator: Enqueue in 'FINISHED' queue
end
loop Every X Minutes
Gate->>Orchestrator: Dequeue cleaning requests by request ID in 'FINISHED' queue
Orchestrator-->Gate: Cleaning request B cleaning result
Gate-->>Gate: Resolve cleaning request B
Gate-->>Gate: Update sharing state of business partner A
end
loop Every X Minutes
Gate->>Pool: Fetch BPN changelog entries from last timestamp
Pool-->>Gate: Entries from given timestamp
opt Business Partner in Gate associated with BPN and newer timestamp
Gate->>Gate: Start new cleaning process (see above)
end
end
Advantages:
Disadvantages:
Variant 2 - Orchestration via Event Queue
Here the RabbitMq replaces a separate orchestration service. Cleaning request processing are realised by chaining events that are generated and consumed by the Gates, Pool and Cleaning service(s). Gates here are responsible for finding a unique request ID. A UUIDv4 together with a timestamp should mitigate any possible conflicts between Gates. Requests are pushed into the respective topics of the queue and so passed along from Gate, Cleaning Service, Pool and back to Gate. The Gates can stay informed about the sharing state by subscribing to the cleaning process step topics and updating their sharing state accordingly.
sequenceDiagram
SharingMember->>Gate: Upsert Business partner A
Gate-->>Gate: Persist business partner input A
Gate-->>Gate: Create timestamped Cleaning Request B for business partner A input
Gate-->>SharingMember: Created business partner A input
Gate->>RabbitMq: Send cleaning request B to topic 'New Cleaning Request'
RabbitMq-->>Gate: Commit
RabbitMq->>CleaningServiceDummy: Send cleaning request B
CleaningServiceDummy-->>RabbitMq: Commit
opt Input has no BPNs
CleaningServiceDummy-->>CleaningServiceDummy: Create unique BPN request IDs
end
CleaningServiceDummy-->>CleaningServiceDummy: Create Cleaning Result for cleaning request B
CleaningServiceDummy->>RabbitMq: Send cleaning result for B to topic 'New Process Step Generic Cleaning Result'
RabbitMq-->>CleaningServiceDummy: Commit
RabbitMq->>Pool: Send cleaning result B
Pool-->>RabbitMq: Commit
Pool-->>Pool: Upsert LSA business partner based on BPNs/BPN-Request-IDs
Pool->>RabbitMq: Send update result for B to topic 'New Pool Update Result'
RabbitMq-->>Pool: Commit
RabbitMq->>Gate: Send update result B
Gate-->>RabbitMq: Commit
Gate-->>Gate: Resolve cleaning request B
Gate-->>Gate: Persist business partner A output
Advantages:
Disadvantages:
EDIT:
Variant 3: Event Queue with Lightweight RequestProvider
Just like Variant 2 heart of the orchestration is RabbitMq but in order to solve security issues, resulting from visible business partner data, we introduce request provider service. This request provider service is responsible for generating request IDs and caching the results. This service provides Gates with a way to receive cleaning results via a private request ID that is only known to the Gate which originally created the cleaning request. In this scenario the CleaningRequestProvider is responsible for creating a private and public request ID that it shares with the requesting Gate. The gate then published a cleaning request similar to Variant 2. At the end of the process though only the CleaningRequestProvider receives the actual cleaning result and broadcasts the notification of a finished cleaning request to the Gates. Only the Gate with the private request ID is then allowed to access the actual request result.
sequenceDiagram
SharingMember->>Gate: Upsert Business partner A
Gate-->>Gate: Persist business partner input A
Gate-->>SharingMember: Created business partner A input
Gate->>CleaningRequestProvider: Request new cleaning request ID
CleaningRequestProvider->>CleaningRequestProvider: Create timestamped Cleaning Request with public ID B and private ID C
CleaningRequestProvider-->>Gate: Cleaning request public ID B and private ID C
Gate-->>Gate: Associate business partner input A with cleaning request IDs
Gate->>RabbitMq: Send cleaning request B with input A to topic 'New Cleaning Request'
RabbitMq-->>Gate: Commit
RabbitMq->>CleaningServiceDummy: Send cleaning request B
CleaningServiceDummy-->>RabbitMq: Commit
opt Input has no BPNs
CleaningServiceDummy-->>CleaningServiceDummy: Create unique BPN request IDs
end
CleaningServiceDummy-->>CleaningServiceDummy: Create Cleaning Result for cleaning request B
CleaningServiceDummy->>RabbitMq: Send cleaning result for B to topic 'New Process Step Generic Cleaning Result'
RabbitMq-->>CleaningServiceDummy: Commit
RabbitMq->>Pool: Send cleaning result B
Pool-->>RabbitMq: Commit
Pool-->>Pool: Upsert LSA business partner based on BPNs/BPN-Request-IDs
Pool->>RabbitMq: Send update result for B to topic 'New Pool Update Result'
Pool->>RabbitMq: Send updated BPNs to topic 'New BPN Update'
RabbitMq-->>Pool: Commit
RabbitMq->>CleaningRequestProvider: Send update result B
CleaningRequestProvider-->>RabbitMq: Commit
CleaningRequestProvider-->>CleaningRequestProvider: Persist update result
CleaningRequestProvider->>RabbitMq: Send notification for cleaning request B 'Cleaning Request FINISHED'
RabbitMq-->>CleaningRequestProvider: Commit
RabbitMq->>Gate: Send notification cleaning request B finished
Gate-->>RabbitMq: Commit
Gate-->>CleaningRequestProvider: Request result for B via private key C
CleaningRequestProvider->>Gate: Result for cleaning request B
Gate-->>Gate: Resolve cleaning request B
Gate-->>Gate: Persist business partner A output
RabbitMq->>Gate: Send updated BPNs
Gate-->>RabbitMq: Commit
opt Business Partner in Gate associated with BPN and newer timestamp
Gate->>Gate: Start new cleaning process (see above)
end
Advantages:
Disadvantages:
EDIT:
In my opinion, as I can see it from the description, Variant 3 combines the disadvantages of the other two variants. Because also here you would need again a polling mechanism for some parts. Or am I wrong?
In my first impression, I would recommend variant 1. Here you have the greatest security and sovereignty over the data and data exchange formats. You also have a good abstraction for the different cleaning states and can easily add new ones by just adding a new API Endpoint in the orchestrator service.
If polling every minute is not enough, you can think about adapting the APIs and the control flow from pull to push. But as far as I know, at the moment, there are no concrete requirements in that regard.
In V2, does it need both a CleaningServiceDummy and a DummyBridge, or is this redundant? And in V3, is the DummyBridge the same as the CleaningServiceDummy?
One thing about the duplicate check in the cleaning service: If it finds out the BP is a duplicate and already in the Pool it could and should directly set its BPN, not just BPN-RequestID, right?
Thinking about V1, couldn't the Orchestrator work without persisting any data and just keep the queue data in RAM? As the Gates should poll for the state of their cleaning requests they could detect if some BP goes missing and just add it again for cleaning. What is the worst that could happen? Maybe if BPs from the FINISHED queue go missing, so they already got a BPN assigned and were added to the Pool. But even then, after adding the BP again the CleaningService should detect it as a duplicate and assign the same BPN / BPN-RequestID and Gate should eventually get the updated BP. If not persisting anything in the Orchestrator is indeed feasible, this would mitigate security issues and also simplify implementation.
For V1 the mentioned disadvantages regarding polling don't seem critical. My main issue for me: Is there a legal problem to have this central Orchestrator component that stores (temporarily) all the different Gates' data?
About V3: On first sight this looks quite convoluted As Tim mentioned, doesn't the Gate need to poll the CleaningRequestProvider to get the cleaning result? So there is no real-time pushing of results. And if the CRP needs to persist the cleaning results this component gets quite complicated instead of the made-redundant Orchestrator. Maybe I'm overlooking something but I don't see any advantage of V3 vs V1.
About the event-based approaches (V2 and V3): I think this is not optimal to standardize as it's a different and additional protocol needed and there is no formal way that I know of to standardize what topics are allowed and which content schema is allowed for each topic.
General idea about persistence: If we really have to persist anything (see my previous comment) in some kind of Orchestrator component, maybe it makes sense to not use a relational DB, as this component doesn't care for the content but just acts as a generic state machine. Maybe just use a file or document-style approach.
About restarting the cleaning process in the end (all variants): I think it still makes sense to flag the new cleaning request as "validate" to prevent another Pool update, as considered in previous versions. If the cleaning service doesn't act completely reliable this might introduce a never-ending cleaning loop. Or maybe use a changed by SharingMember timestamp.
@martinfkaeser how to store or not store the processed data is another discussion. You can configure in all Variants if intermediate data should be deleted or not. In this discussion we should really focus on the concept we want to use for orchestrating our process handling. And for this I'm also on your side. I do not see really many benefits that has a message bus instead of Variant 1. Instead, I see more downsides in Variant 2 and 3.
But I am open to discuss and hear arguments that speak for Variant 2 or 3 which we not had in mind right now 😃
We can create multiple queue from rabbitmq for each Gate component. Now Gate will subscribe the queue. Cleaning_Service will handle all messages and it contains the BPN with the queue identity so when we need to publish the message to sharing member's then we will publish it onto the rabbitmq.
@startuml
participant Sharing_Member as Sharing_Member
control GATE as GATE
database Gate_Database as Gate_Database
queue RabbitMQ as RabbitMQ
control Cleaning_Service as Cleaning_Service
database Cleaning_Database as Cleaning_Database
control Service_Provider as Service_Provider
control Pool as Pool
database Pool_Database as Pool_Database
RabbitMQ <- Cleaning_Service: Subscribe to the topic
RabbitMQ --> Cleaning_Service: Subscription has been done (Now when the message will publish on the topic \nIt will consume by the Cleaning_Service)
note over Sharing_Member, Pool_Database
Create a topic for each Gate component.
end note
GATE -> RabbitMQ: Subscribe to the topic that was created for Gate (Gate topic)
GATE <-- RabbitMQ: Subscription has been done (Now when the message will publish on the topic \nIt will consume by the Gate component)
Sharing_Member -> GATE: Share Partners data
GATE -> GATE: Valid the requested data
GATE --> Sharing_Member: Return with a validation error message
GATE -> Gate_Database: save the partner's details with the input stage
GATE -> RabbitMQ: Publish data that need to clean on the topic (GR topic)
RabbitMQ -> Cleaning_Service: Push the data to the GR service
Cleaning_Service -> Cleaning_Service: Validation of the received data from the service bus
Cleaning_Service --> RabbitMQ: Publish the request with the validation failed
Cleaning_Service-> Cleaning_Database: If data is valid then save it in the database
Cleaning_Service -> Cleaning_Service: Prepare the request for Service_Provider
Cleaning_Service -> Cleaning_Database: Save the data to the table that was listened to by Service_Provider
note over Cleaning_Database
Need to create a User who has access to Service_Provider tables.
Share the user details with Service_Provider.
end note
Cleaning_Database <-- Service_Provider: Corn will be scheduled from Service_Provider for lookup of the input table
Service_Provider -> Service_Provider: Process the validation
Service_Provider -> Service_Provider: The cleanup process has been initiated
Service_Provider -> Cleaning_Database: Save the data in the output table
Cleaning_Service-> Cleaning_Database: Corn will be scheduled to lookup the output table \nwhere the cleaned data stored
Cleaning_Service -> Pool: Initiate the BPN generation process
Pool-> Pool: Validate the BPN issuing request
Pool --> Cleaning_Service: Return with the error if validation will be failed
Cleaning_Service --> RabbitMQ: Return the requested data with the validation failed message
Pool -> Pool: Issue the BPN for partner
Pool -> Pool_Database: Save the partner's details to the database
Pool -> Cleaning_Service: Return with the data which contains the BPNL/S/A
Cleaning_Service -> Cleaning_Database: Save the data with BPNL/S/A
Cleaning_Service -> Cleaning_Database: Fetch the topic from the database used for publishing \nThe topic that will indicate the Gate component
Cleaning_Service -> RabbitMQ: Publish the cleaned data on the multiple topics \nMultiple topic comes when multiple Gate has the same BP
GATE <- RabbitMQ: Push the data to the relevant gate
GATE -> Gate_Database: Save the cleaned data on the output stage
GATE <- Sharing_Member: Now sharing member can access the cleaned data with BPNL/S/A
@enduml
@gerlitmcoding I think the question of persistence is relevant because our first concept was basically V1. But then during the PI planning concerns about storing all the different Gates' data in one central location lead us to consider a decentralized approach doing the work in the Gates. So this might be a knock-out criterion, or am I mistaken?
@martinfkaeser No, it's no knock-out criteria. Also with Variant 1 you can "abstract" the storing mechanism via interfaces. So basically, you can use in-memory, Redis Cache or PostgreSQL in a concrete implementation. And after processing the records, in all solutions you would be able to delete the records if this is a requirement. Also with message queueing you would have the issue that records get stored in the queue for some time.
@pd0856 can you explain please in a little bit more detail what the advantages of you solution vs. the outlined Variant 1 would be?
Also this seems for me a little bit unclear:
Now Gate will subscribe the queue. Cleaning_Service will handle all messages and it contains the BPN with the queue identity so when we need to publish the message to sharing member's then we will publish it onto the rabbitmq.
To publish messages back to a Gate you would need an extra topic which the Gate subscribes on. So basically you would need two topics per Gate, one for publishing, one for subscribing. And in the cleaningService you have to handle/store for each record where you have to publish the cleaned results? And how do you distribute the resulting updates to other Gates while preventing that they can see the whole uploaded data from the "Uploader-Gate"?
According to you PlantUML, you also still have polling inside the End-To-End Flow? So still no "real-time" communication?
And can you please also explain the goal/purpose and functions of the CleaningService in more detail?
Variant 1: Why does Orchestrator ist polled by the CleaningserviceDummy? In Variant 2 the RabbitMQ sends to and polls from the CleaningserviceDummy. I would prefer this for Variant 1.
How long is the cleaning process expected to last (Seconds, Minutes, Hours)? Are asynchronous REST services an option instead of polling?
@pd0856 can you explain please in a little bit more detail what the advantages of you solution vs. the outlined Variant 1 would be?
Also this seems for me a little bit unclear:
Now Gate will subscribe the queue. Cleaning_Service will handle all messages and it contains the BPN with the queue identity so when we need to publish the message to sharing member's then we will publish it onto the rabbitmq.
To publish messages back to a Gate you would need an extra topic which the Gate subscribes on. So basically you would need two topics per Gate, one for publishing, one for subscribing. And in the cleaningService you have to handle/store for each record where you have to publish the cleaned results? And how do you distribute the resulting updates to other Gates while preventing that they can see the whole uploaded data from the "Uploader-Gate"?
According to you PlantUML, you also still have polling inside the End-To-End Flow? So still no "real-time" communication?
And can you please also explain the goal/purpose and functions of the CleaningService in more detail?
Yeah sure, Let me discuss the overall the sequence diagram. And also regarding the Cleaning_Service, this service will have the central database where we will manage the Gate with it's relevant topic. Based on the request that was received by rabbitmq Cleaning_Service will identify where to publish the message. Here also i assume that the service provider will read the requests from the database. But that we can change. Or also i assume that service_provider is not support the webhook mechanism so we need to lookup the database which contains the processed data. Here also if BPN will be relevant to multiple Gate then Cleaning_service will identify that stuff as well and publish the same business partner on the multiple queues. Additionally, No other Gate will see if the partner is not a part of there component.
Variant 1: Why does Orchestrator ist polled by the CleaningserviceDummy? In Variant 2 the RabbitMQ sends to and polls from the CleaningserviceDummy. I would prefer this for Variant 1.
The CleaningServiceDummy is an exchangable service which in production can be replaced by a more elaborate cleaning service. Keeping the orchestrator passive allows us to keep the orchestrator the same while exchanging the cleaning service and letting it integrate into the orchestrator. The other way around is possible as well of course but would require a different strategy: The orchestrator would need exchangable components that integrate against the respective cleaning service. In my opinion the first option is simpler and allows exchanging the cleaning service by configuration of deployed services.
How long is the cleaning process expected to last (Seconds, Minutes, Hours)? Are asynchronous REST services an option instead of polling?
I think nobody can answer that at the moment as this is dependent on the actual cleaning service provider who performs the cleaning. In what way would asynchronous REST be different from polling? Polling is part of asynchronous calls, except if you want to use callbacks.
When I review all 3 variants I'm missing the following scenarios:
Thanks Nico for documenting the alternatives! Here are a few thoughts / comments on these:
While v2 attracts with its simplicity there are potential security concerns listed. What would be attack vectors here and would a similar attack in the other architecture proposals need higher access levels to succeed?
v2 seems just from the first view "simple". When it comes to implementation you are introducing a completely new system (message bus) and need new expertise in terminology, operating and managing, which takes a lot of time. v1 instead would build up on known technology components by "just" implementing a new Service Component. In addition, in v1 the shared business partners stay anonym via a "Ticket-ID" so no one except the Sharing Member itself knows about his business partner. In a message bus approach, you would have at least a 1:1 Relation between sharing member and topic where sharing member specific business partner could be tracked.
Does the current architecture already cover the concept of "Hardening Grade" for records? This requires a backend service provider to know about record ownership, which partially breaks through the requirement for complete anonymity of records.
Having an Identifier for Ownership in the data model is planned for the next release. Also for this, with v1 you can decide via your API Endpoints in the Orchestrator, who should have access to this field and how not. Via Topic approach, you would just push the complete business partner to the topic and a cleaning provider or someone else would see the complete message.
Sharing of a changed address requires projections of Golden Records to customer-specific versions, since customers should only receive the fields of an address that they are actually also inserting into the process. How can we cover this in the current designs?
All Gates periodically pull the Pool Changelog for changed BPNs. If there are changed BPNs that match with BPNs that are stored within the Gate, the Gate triggers a new cleaning request which starts the golden record process for this record. In the end it can fetch the result via the generated Ticket-ID.
How do we want to process records returned from a provider? Does the bridge need to iterate through all gate instances to check to which gate to send a message to via the bus? Or do we put changed records more generically as a broadcast onto the bus and each gate instance decides for itself what to do with it, i.e. filtering if BPN is not relevant for the gate instance, projection to relevant fields in the gate etc. The latter seems to be much simpler and provides better decoupling, but it brings us back to the security concern from above?!
See my previous comment. If we would do it like this via "broadcast", all other gates would also receive sharing member specific data which might cause a security issue. For example, all members might use the nameParts
Field for different internal purposes and no other member should be able to recognize this.
To simplify debugging/reporting a logging mechanism would be helpful. Is this already part of the current designs? If we would extensively use a bus, this infrastructure could also be used as a basis for logging with a logger component simply listening on the bus?
For sure logging is an important topic. But logging can be realized with and without a message bus. So, logging and messaging are separate topics for me.
I have created a pull request proposing an ADR on how to implement the orchestrator component. It should focus on KISS principle and simplicity. https://github.com/eclipse-tractusx/bpdm/pull/418
When it comes to alternative one, i.e. no message bus, how do we share a changed record? I understand the gates are polling for updates in this approach and I assume that maybe a timestamp could be a parameter here, i.e. "give me back updates since timestamp". But since it was argued that only relevant records should be passed to a gate instance, how do we know in the orchestrator which updated records to return to a polling gate?
Isn't it true that we have to differentiate between different change situations:
When it comes to alternative one, i.e. no message bus, how do we share a changed record? I understand the gates are polling for updates in this approach and I assume that maybe a timestamp could be a parameter here, i.e. "give me back updates since timestamp". But since it was argued that only relevant records should be passed to a gate instance, how do we know in the orchestrator which updated records to return to a polling gate?
@cofinity-x-stephan based on the changelog in the pool on which the gates are polling, they check if they have an BPN which was updated in the pool. If so, the according gate will trigger a new cleaning request for his business partner with that BPN and the cleaning process starts for the gate. So not the whole Business Parter Object gets passed through all gates.
thanks tim! but this means that for an address that is used in 20 gates we also have 20 runs of the cleaning process even though only a single run actually is needed? or would the first run in your concept be different from the consecutive runs? we need to be cautious here since the backend service may also include human data curators that have to be paid, thus we need to minimize calls to that service to provide a cost effective solution.
Since we do not know about the concrete cleaning service (provider) we are open for discussions and how you would propose the end-to-end flow. Generally, speaking we have to take a decision for overall architecture and target goal which in my opinion should be the API based service orchestrator approach because of its simplicity and flexibility. If we come up at some point that a messaging system is really mandatory, of course we can introduce one. But for now, focusing on what is really needed and how we can deliver fast, the API based approach matches best!
In regard to your concrete question. Yes, it was intended like you describe, because it would be a solution that works out of the box for a first minimal viable product. For sure we can also think about introducing an additional state in the orchestrator for business partner records that should not be "cleaned", because they already are, but should just be updated with latest changes from Golden Record Pool. The Problem here comes with the details. Because Golden Record Pool data model differentiates from Gate Model and somehow a mapping between both have to be done when we say that the business partner record should not again run through the whole cleaning process. Especially thinking about that every sharing member has different semantics of gate specific data model fields, i.e. the name parts field is used differently through all sharing members.
I have prepared a cleaned and slimmed down variant 1 overview which is intended to act as a base for more issues detailing work packages needed to achieve what we need. This diagram still expects that each Gate will start its own cleaning process after it identifies a BPN update in the Pool. But optimizations here are easy to make in this regard. With a few tweaks the Gates needing only an update could trigger a slimmed down version of the cleaning process for example.
sequenceDiagram
SharingMember->>Gate: (1) Upsert business partner A input
Gate-->>Gate: (2) Persist business partner A input
Gate->>Orchestrator: (3) Request cleaning of business partner A input
Orchestrator-->>Orchestrator: (4) Create timestamped Cleaning Request B for business partner A input
Orchestrator-->>Orchestrator: (4) Update Cleaning Request B State: QUEUED_GENERIC_CLEANING
Orchestrator-->>Gate: Created cleaning request B
Gate-->>Gate: (5) Associate cleaning request ID with business partner A
Gate-->>Gate: (5) Upsert sharing state of business partner A
Gate-->>SharingMember: Upserted business partner A input
loop Poll Generic Cleaning
CleaningServiceDummy->>Orchestrator: (6) Fetch and reserve next cleaning requests in 'Generic Cleaning' queue
Orchestrator-->>Orchestrator: (4) Update state of cleaning request B: RESERVED_GENERIC_CLEANING
Orchestrator-->>CleaningServiceDummy: Cleaning request B with business partner input
CleaningServiceDummy->>CleaningServiceDummy: (7) Create dummy cleaning results
opt BPN does not exist
CleaningServiceDummy->>CleaningServiceDummy: (7) Add BPN request IDs into results
end
CleaningServiceDummy->>Orchestrator: (8) Send cleaning result for request B
Orchestrator-->>Orchestrator: (4) Update Cleaning Result of request B
Orchestrator-->>Orchestrator: (4) Update state of cleaning request B: QUEUED_BPN_PROCESSING
Orchestrator-->>CleaningServiceDummy: Accept
end
loop Poll BPN Processing
Pool->>Orchestrator: (9) Fetch and reserve next cleaning requests in 'BPN Processing' queue
Orchestrator-->>Orchestrator: (4) Update state of cleaning request B: RESERVED_BPN_PROCESSING
Orchestrator-->>Pool: Cleaning request B cleaning result
opt L/S/A result has changed
Pool-->>Pool: (10) Upsert L/S/A business partners with result
end
Pool-->>Pool: (11) Add BPNs from BPN request IDs to cleaning result
Pool->>Orchestrator: (12) Send BPN cleaning result for request B
Orchestrator-->>Pool: Accept
Orchestrator-->>Orchestrator: Update state of cleaning request B: FINISHED
end
loop Poll for finished Cleaning Requests
Gate->>Orchestrator: (13) Fetch cleaning request cleaning result by request ID
Orchestrator-->Gate: Cleaning request B cleaning result
Gate-->>Gate: (5) Resolve cleaning request B
Gate-->>Gate: (5) Update sharing state of business partner A
end
I also created a simple dependency chart showing which work packages are dependent on each other in what way:
flowchart LR
model[Generic Business Partner DTO Interface]
subgraph GateLSA[Gate LSA]
lsaMapLegalEntity["(15) Logic: Map legal Entities to generic model"]
lsaConnectLegalEntity["(16) Logic: Connect legal entity endpoints to generic service logic"]
lsaMigrateLegalEntity["(17) Db-Script: Migrate existing legal entities to generic business partners"]
lsaMapSite["(18) Logic: Map sites to generic model"]
lsaConnectSite["(19) Logic: Connect site endpoints to generic service logic"]
lsaMigrateSite["(20) Db-Script: Migrate existing sites to generic business partners"]
lsaMapAddress["(21) Logic: Map addresses to generic model"]
lsaConnectAddress["(22) Logic: Connect address endpoints to generic service logic"]
lsaMigrateAddress["(23) Db-Script: Migrate existing addresses to generic business partners"]
end
lsaMapLegalEntity-->lsaConnectLegalEntity
lsaConnectLegalEntity-->lsaMigrateLegalEntity
lsaMapSite-->lsaConnectSite
lsaConnectSite-->lsaMigrateSite
lsaMapAddress-->lsaConnectAddress
lsaConnectAddress-->lsaMigrateAddress
subgraph GateOrch[Gate Orchestration]
gateGenericInputEndpoints["(1) Input Endpoints for Generic Business Partner"]
gateGenericPersistence["(2) Persist Generic Business Partners"]
gateRequestCleaning["(5) Request Cleaning for Own Update"]
gateResolveRequest["(13) Resolve finished Cleaning Results"]
gateCheckBpnUpdate["(14) Request Cleaning for BPN Update"]
end
gateGenericInputEndpoints-->gateGenericPersistence
gateGenericPersistence-->gateRequestCleaning
gateRequestCleaning-->gateResolveRequest
gateRequestCleaning-->gateCheckBpnUpdate
subgraph Orchestrator[Orchestrator]
orchBase[Setup module]
orchRequestCleaningEndpoint["(3) Request Business Partner Cleaning Endpoint"]
orchCleaningQueueEndpoint["(6) Reserve Cleaning Requests for Cleaning Endpoint"]
orchCleaningResolveEndpoint["(8) Resolve Cleaning Request for Cleaning Endpoint"]
orchBpnQueueEndpoints["(9) Reserve Cleaning Requests for BPN Processing Endpoint"]
orchBpnResolveEndpoint["(12) Resolve Cleaning Request for BPN Processing Endpoint"]
orchFinishedEndpoint["(13) Fetch Finished Cleaning Request Endpoint"]
orchStateMachine["(4) Manage Cleaning Request State and Data"]
end
orchBase-->orchRequestCleaningEndpoint
orchBase-->orchCleaningQueueEndpoint
orchBase-->orchCleaningResolveEndpoint
orchBase-->orchBpnQueueEndpoints
orchBase-->orchBpnResolveEndpoint
orchBase-->orchFinishedEndpoint
orchRequestCleaningEndpoint-->orchStateMachine
orchCleaningQueueEndpoint-->orchStateMachine
orchCleaningResolveEndpoint-->orchStateMachine
orchBpnQueueEndpoints-->orchStateMachine
orchFinishedEndpoint-->orchStateMachine
subgraph CleaningDummy[Cleaning Service Dummy]
dummyBase[Setup module]
dummyServiceLogic["(7) Create Dummy Cleaning Result"]
end
dummyBase-->dummyServiceLogic
subgraph Pool[Pool]
poolUpsert["(10) Upsert Business Partners from Cleaning Result"]
poolResult["(11) Send BPN Processing Result"]
end
poolUpsert-->poolResult
orchRequestCleaningEndpoint-->gateRequestCleaning
orchCleaningQueueEndpoint-->dummyServiceLogic
orchBpnQueueEndpoints-->Pool
orchBpnResolveEndpoint-->Pool
orchFinishedEndpoint-->gateResolveRequest
gateGenericPersistence-->lsaConnectLegalEntity
gateGenericPersistence-->lsaConnectSite
gateGenericPersistence-->lsaConnectAddress
model-->GateLSA
model-->GateOrch
model-->Orchestrator
model-->CleaningDummy
model-->Pool
I created also issues for work packages which give more details: (1) #384 (2) #428 (3) #425 #429 (4) #427 (5) #429 (6) #388 (7) #433 (8) #392 (9) #434 (10, 11) #432 (12) #435 (13) #426 #430 (14) #431 (15) #441 (17) (18) #441 (19) (20) (21) #441 (22) (23)
Please not that the linked issues need refinment still and will undergo change in the next days.
EDIT:
Update of the sequence diagram to showcase business partner updates when a BPN has changed. Will trigger a different workflow than the conventional cleaning process.
EDIT 2: Removed the BPN update flow as its now depicted separetely here: https://github.com/eclipse-tractusx/bpdm/issues/377#issuecomment-1715421575
This is a squence diagram of the orchestration flow depicting the use case of updating a Gate business partner entry based on a BPN update from the pool. In this flow the Gate business partner does not have any new information and just needs the newest state from the Pool incorporated. The service provider is responsible for incorporating the Pool state to the Gate business partner. No extensive cleaning is needed in this case.
sequenceDiagram
loop Poll Pool BPN Changes
Gate->>Pool: Fetch BPN changelog entries from last timestamp
Pool-->>Gate: Entries from given timestamp
opt Business Partner in Gate associated with BPN C and newer timestamp (14)
Gate->>Orchestrator: Request update for business partner A from Pool
Orchestrator-->>Orchestrator: Create timestamped Cleaning Request D for business partner A input
Orchestrator-->>Orchestrator: Update Cleaning Request B State: QUEUED_UPDATE_LSA
Orchestrator-->>Gate: Created cleaning request D
Gate-->>Gate: Associate cleaning request D with business partner A
Gate-->>Gate: Upsert sharing state of business partner A
end
end
loop Pool L/S/A Update queue
CleaningServiceDummy->>Orchestrator: Fetch and reserve next cleaning requests in 'Update L/S/A' queue
Orchestrator-->>Orchestrator: Update state of cleaning request B: RESERVED_UPDATE_LSA
Orchestrator-->>CleaningServiceDummy: Cleaning request D with business partner input A
CleaningServiceDummy->>Pool: Request L/S/A business partner with BPN C
Pool-->>Pool: Fetch L/S/A business partner C form database
Pool-->>CleaningServiceDummy: Return L/S/A business partner C
CleaningServiceDummy-->>CleaningServiceDummy: Create dummy update result
CleaningServiceDummy->>Orchestrator: Send update result for request D
Orchestrator-->>Orchestrator: Update Update Result of request D
Orchestrator-->>Orchestrator: Update state of cleaning request D: FINISHED
Orchestrator-->>CleaningServiceDummy: Accept
end
loop Poll for finished Cleaning Requests
Gate->>Orchestrator: (13) Fetch cleaning request cleaning result by request ID
Orchestrator-->Gate: Cleaning request D cleaning result
Gate-->>Gate: (5) Resolve cleaning request D
Gate-->>Gate: (5) Update sharing state of business partner A
end
Since the concept of the orchestration layer has evolved alot in our discussion I have opened a new issue #455 in which the newest state of our concept is condensed. This helps us keep the concept concise and readable. Please post further comments on this to the new issue #455.
+Benefit Hypothesis & Problem Statement+
As an operator, I need a way to process Business Partner data records based on different states and to be independent of the service provider used, so that I could replace it in a modular way or build the process steps in a more modular way.
The proposed way is by implementing an Orchestrator Layer that acts as a passive component and offers for each processing steps individual endpoints and sets according to this the current state for the data record for further processing. The processing of data will be in chunks.
(Chunks are necessary when thinking about checking for duplicates.)
As an Architect, I have the requirement, that the orchestrator component is flexible designed, so that new "states" for the data processing can easily added.
Following diagram illustrates the processing flow and interaction between the components.
sequenceDiagram Gate->>Orchestrator: Upload new or changed BP Orchestrator->>Orchestrator: Set State: new Orchestrator->>Orchestrator: Set External ID / Request ID: (Linkage) Orchestrator-->>Gate: Send Req ID loop Every Hour Simulator->>Orchestrator: Fetch for "new" Orchestrator-->>Simulator: BP with "new" State end Simulator->>Simulator: Duplicate Check Simulator->>Simulator: Natural Person Screening Simulator->>Simulator: Curation Simulator->>Simulator: Enrichment opt BPN does not exist Simulator->>Pool: Create new BP Pool->>Pool: Create BPN Pool-->>Simulator: BPN end Simulator-->>Orchestrator: Cleaned BP with BPN Orchestrator->>Orchestrator: Set State: CLEANED loop Every Hour Pool->>Orchestrator: Fetch "CLEANED" BP Orchestrator-->>Pool: BP with "CLEANED" State end Pool->>Pool: Update BP Pool->>Orchestrator: Commit Orchestrator->>Orchestrator: Set State: "FINISH" loop Every Hour Gate->>Orchestrator: Fetch for "FINISH" && "ReqID" Orchestrator-->>Gate: BP that match "FINISH" && "ReqID" Gate-->>Gate: Update BP end loop Every Hour Gate->>Pool: Fetch Changelog Pool-->>Gate: Send Changelog with BPNs Gate->>Gate: Filter for relevant BPNs where timestamp is newer than existing Gate->>Pool: Fetch changed BP Pool-->>Gate: Send changed BP Gate->>Gate: Update changelog for BPNs end
Components: BPDM_GR Sprints: N/A Fix Versions: N/A StoryPoints: N/A Attachments: CXAR-1020-mermaid-diagram-2023-07-19-144421.svg