Missing GSIM class – specification of Exchange Channel

FlavioRizzolo commented 2 years ago

There is a lack of GSIM information object to describe the specification of Exchange Channel in general (c.f., there is Questionnaire Specification and Output Specification for Questionnaire and Product which are the sub-types of Exchange Channel)

FlavioRizzolo commented 2 years ago

I propose the creation of this channel specification pattern, which can be used by Exchange Channel sub-classes if necessary. The diagram shows the pattern applied to Questionnaire. It can easily be applied to Product in the same way (in fact, the pattern is inspired by Product)

FlavioRizzolo commented 2 years ago

Example for Product

We should probably rename Presentation to ProductPresentation and OutputSpecification to ProductSpecification

FlavioRizzolo commented 2 years ago

To address one of the issues raised last year, that a presentation shouldn't require and InformationSet, I think that creating a super-class InformationStructure would help. The new InformationSet would parallel what we have for ReferentialMetadataSet and DataSet so that we can talk about the structure of either with one class.

Proposed changes are outlined in red.

FlavioRizzolo commented 2 years ago

Extended pattern:

FlavioRizzolo commented 2 years ago

Proposed changes:

The new ChannelPresentation requires the new InformationStructure instead of InformationSet, which is now optional.
Associations between sub-classes are deleted because they are inherited, e.g. between Product, Presentation and OutputSpecification.
Protocol has confused a lot of people. One reason for that is the it seems to be defined as "The mechanism for exchanging information. In that case then perhaps renaming it "ExchangeMechanism" could help. Another reason for the confusion could be that it seems to be actually a design-time artifact based on its explanatory text: "A Protocol specifies the mechanism [...] of exchanging information". If this is the case, then perhaps we can rename it ProtocolSpecification. We need a decision either way.
The new ChannelSpecification is a design-time artifact, which means that ExchangeChannel is a runtime artifact. That would address an issue from last year that there was no clarity which class represented a running channel.

We could think of this pattern as related to the classical Model-View-Controller (MVC) pattern. (see MVC reference):

The model is responsible for managing the data of the application. It receives user input from the controller.
The view renders presentation of the model in a particular format.
The controller responds to the user input and performs interactions on the data model objects. The controller receives the input, optionally validates it and then passes the input to the model.

In our case, the InformationSet/Structure would be the model, the ChannelPresentation the view, and the Channel itself the controller. This is not a formal mapping, just another way of looking at these classes so that they make more sense.

FlavioRizzolo commented 2 years ago

According to the definition of ExchangeChannel, "The Exchange Channel is used for external and internal purposes", which means the collection and dissemination are just examples. However, that is not clear at all from other definitions and explanatory texts, or even the chosen extensions, e.g. Questionnaire, AdministrativeRegister, etc. We need to review the definitions and examples to make sure they clearly show the internal use case. We could add examples of data repositories for governance and harmonization, e.g. Data Hubs, Data Marts, which can be considered ExchangeChannels (much in the same way as AdministrativeRegisters).

andreapetres commented 2 years ago

An example of using the classes mentioned above in a process model: GSIM + process 20220309.pdf

FrancineK commented 2 years ago

To address one of the issues raised last year, that a presentation shouldn't require and InformationSet, I think that creating a super-class InformationStructure would help. The new InformationSet would parallel what we have for ReferentialMetadataSet and DataSet so that we can talk about the structure of either with one class.

Proposed changes are outlined in red.

Data Structure is required and should not be deleted, because of components that are specific Data Structure and not relevant to Information Structure. This would probably the case for Referential Metadata Structure:

For Data Resource and Referential Metadata Resource, there is no harm in leaving them as is. But if they are to be removed, the explanatory text should include that the Information Resource can be specialized in Data Resource and Referential Metadata Resource.

krishnan-ambady-ds commented 2 years ago

The proposed changes is shown below as a high level domain model,

The Model complies with most existing definitions of --protocol,protocol specification , product/product container , producer to a consumer. Need to add new destination-type to store or present . The main different is that the Exchange channel is defined as a Transport entity (even in the hub-spoke) context. It does not produce or consume, but is used by producer or consumer to transport/exchange the Product (information -set).

FlavioRizzolo commented 2 years ago

To address one of the issues raised last year, that a presentation shouldn't require and InformationSet, I think that creating a super-class InformationStructure would help. The new InformationSet would parallel what we have for ReferentialMetadataSet and DataSet so that we can talk about the structure of either with one class. Proposed changes are outlined in red.

Data Structure is required and should not be deleted, because of components that are specific Data Structure and not relevant to Information Structure. This would probably the case for Referential Metadata Structure:

For Data Resource and Referential Metadata Resource, there is no harm in leaving them as is. But if they are to be removed, the explanatory text should include that the Information Resource can be specialized in Data Resource and Referential Metadata Resource.

Of course. The proposal was to delete only the associations that are now inherited from the super-classes, not the classes themselves. At the class level there is no deletion, only the proposed addition of InformationStructure.

andreapetres commented 2 years ago

Exchange channel:

In the software we use for process modelling there is a similar object, called Distribution channel. Definition: "Describes the path a product follows to reach the customer.” It is used in models that describe different aspects of the interaction with users, here is an example: third image - Customer journey map.

FlavioRizzolo commented 2 years ago

The Model complies with most existing definitions of --protocol,protocol specification , product/product container , producer to a consumer. Need to add new destination-type to store or present . The main different is that the Exchange channel is defined as a Transport entity (even in the hub-spoke) context. It does not produce or consume, but is used by producer or consumer to transport/exchange the Product (information -set).

I'm trying to see how this can work with the examples we have in a way that is not too system-oriented.

For instance, let's take the registers. A register doesn't seem to be a product that is transmitted or exchange, it's the actual means. What's transmitted is an information set extracted from the register. If the register is maintained in a relational DB and we connect to it via ODBC to run SQL queries, isn't ODBC the mechanism, hence the protocol? What's the channel then between the information and the consumer if not the register itself?

FlavioRizzolo commented 2 years ago

I've put together a tentative proposal integrating most of Khrishnan's and Andreas' ideas above, as best as I understand them.

The main change is the view of channel in a more traditional way as a transport/interface mechanism, which includes a Protocol, e.g. web service, FTP, face-to-face interview. A new class, tentatively named ExchangeHub, captures the container for the content exchange, e.g. Questionnaire, Register, Product, etc. Presentations for all channels/hubs are captured by a new ExchangePresentation class, where questionnaire modes, pdfs, webpages, etc. are represented.

Cardinalities need to be discussed in more detail once the over picture is more or less in place.

InKyungChoi commented 2 years ago

Regarding the proposal above:

How about just "Presentation" as the name of the class (currently named as) "Exchange Presentation"? I think "Exchange Presentation" sounds rather puzzling. If we are going to rename "Presentation" as "ProductPresentation", we can use the name "Presentation" for the superclass.
I wonder, with the new class "Exchange Specification" ("specifies all the component that might be necessary for an exchange to work"), if "Provision Agreement" is still needed?
I think the name "Exchange Hub" is quite confusing, is it a synonym of "Manager, Container, Organizer"?

FlavioRizzolo commented 2 years ago

Regarding the proposal above:

How about just "Presentation" as the name of the class (currently named as) "Exchange Presentation"? I think "Exchange Presentation" sounds rather puzzling. If we are going to rename "Presentation" as "ProductPresentation", we can use the name "Presentation" for the superclass.

I agree.

I wonder, with the new class "Exchange Specification" ("specifies all the component that might be necessary for an exchange to work"), if "Provision Agreement" is still needed?

Protocol and Provision Agreement are part of the specification, I think.

I think the name "Exchange Hub" is quite confusing, is it a synonym of "Manager, Container, Organizer"?

It's a bad name, I just couldn't find a better one.

Manager, container and organizer are other options that came to my mind. It's where the content to be exchanged is maintained and organized, the capture or sharing tool. Essentially, it's the former ExchangeChannel minus the transport piece (send/receive). For instance, an electronic questionnaire would be the new ExchangeHub, the web page would be the new ExchangeChannel, HTTP+HTML would be the Protocol, the ProvisionAgreement would be the usual, and the ExchangeSpecification would be the design that puts all that together.

FrancineK commented 2 years ago

Provision Agreement informs the specification, it's not quite part of it. I think it is still needed as an its own entity to store the negotiated/agreed basis for exchange: retention, sharing agreement, etc. For ExchangeHub, what about ExchangeInformationContainer - It's long, I know? They are all containers for Information being exchanged where as Manager works more for Registers but not for Product, Questionnaire in my opinion. Also, DataHaverst now becomes a real ExchangeChannel. DataHarvest: A concrete and usable tool to pass information between two sources, usually by a machine to machine mechanism. It is not an "ExchangeHub", a container which act as a source or target to hold information.

FlavioRizzolo commented 2 years ago

Provision Agreement informs the specification, it's not quite part of it. I think it is still needed as an its own entity to store the negotiated/agreed basis for exchange: retention, sharing agreement, etc.

Yes, we definitely need the class, same as Protocol. I just meant that the specification as a design document might have the agreement as a part, but it might just be a supporting document informing it.

For ExchangeHub, what about ExchangeInformationContainer - It's long, I know? They are all containers for Information being exchanged where as Manager works more for Registers but not for Product, Questionnaire in my opinion.

Perhaps InformationContainer?

Also, DataHaverst now becomes a real ExchangeChannel. DataHarvest: A concrete and usable tool to pass information between two sources, usually by a machine to machine mechanism. It is not an "ExchangeHub", a container which act as a source or target to hold information.

I like the idea of DataHarvest being an ExchangeChannel with the new definition.

FlavioRizzolo commented 2 years ago

Ok, here is my last model:

We really don't need the InformationHub, but I think it's a nice way of representing where Registers and Data Hubs in general fit in the information exchange story.

I think this covers everything we discussed. Granted, the notion of Product is still different from what Andrea is using in her process model, but that can be addressed by just creating a wrapper class in the implementation model for InformationSet, InformationStructure and Presentation. Some impedance mismatch between GSIM and implementation models is expected, we just need to minimize it and ensure a mapping, e.g. via a wrapper class, is straightforward. In the end, we do need to define Product as an InformationExchange, giving the nature of Product, which includes dynamic content and online query tools.

Barring some minor changes, e.g. renaming, cardinalities, etc. this model should be it.

FrancineK commented 2 years ago

This looks OK to me. But I am still not sure the Administrative aspect of Information Exchange is being captured.

FlavioRizzolo commented 2 years ago

This looks OK to me. But I am still not sure the Administrative aspect of Information Exchange is being captured.

Yes, that connection is weaker now. There is a composition between InformationExchange and InformationHub though, which is as strong as you can get between two classes..

In the end, I think that trying to capture registers (a type of repository) as a channel/exchange entity is what got us into this mess. The idea has merit, but we need to stretch these notions too far to make them fit, so keeping information repositories (hubs) connected but separated seems more precise and clearer for most people.

We didn't discuss micro-data dissemination hubs, like PUMFs repositories, which are similar to registers from the exchange point of view, and many others, like dissemination databases, e.g. CANSIM, that function as a backend that can be accessed from a multitude of products. This model covers that case too.

FrancineK commented 2 years ago

Yes, I was actually thinking that we should have another subtype of InformationExchange for the administrative nature/type of exchange in addition to having the registers as InformationHub(s). We will have Questionnaire, DataHaverst, Administrative??? and Product.

FlavioRizzolo commented 2 years ago

Yes, I was actually thinking that we should have another subtype of InformationExchange for the administrative nature/type of exchange in addition to having the registers as InformationHub(s). We will have Questionnaire, DataHaverst, Administrative??? and Product.

If I understand correctly, you are proposing to change the composition between InformationExchange and InformationHub to an isA relationship.

The problem with that, I think, is that it would still make hubs a type InformationExchange, which is what created the original problem. We'll be putting together transport and content management, won't we?

FrancineK commented 2 years ago

Not at all... in addition to AdministrativeRegister for managing the content, there is an "Administrative Data Collection Tool" for bringing the information inside the organization.

InKyungChoi commented 2 years ago

Hi, here is another proposal:

Separate (Statistical and Administrative) Register and link it to Statistical Support: it was mentioned during our discussions on Statistical Program vs. Statistical Support (Program), maintenance of registers is a part of Statistical Support. I think linking Register with Statistical Support can indicate the level of management needed for Register. Note also that Register is linked with Information Set which is also linked to Product (see 3 below). I guess there is a lot of things that could be added around Register but for now, left it simple...
Separate Product: I think for similar reasons as Register, Product needs to be separated
Remove link between Product and Presentation, and add link with Information Set: as discussed before, Presentation becomes independent from content. Also, to make it re-usable and independent from any specific product, the description of association between “Output Specification” and “Presentation” changes from “defines” to “uses”.
Change the name “Exchange Channel” to “Exchange Tool” (or Instrument or Mechanism?): now that we are left with only “Questionnaire” and “Data Harvest” in “Exchange Channel”, we might need to update its definition and names accordingly. The current definitions of Questionnaire (“A concrete and usable tool to elicit information from observation Units”) and Data Harvest (“A concrete and usable tool to pass information between two sources, usually by a machine to machine mechanism”) indicate that they are more of concrete tool than abstract notion of exchanging information. Linking GSBPM-GSIM task team also raised an issue that there is no GSIM class that could represent a concrete application that we build in GSBPM Phase 3 (Build) and use in GSBPM Phase 4 (Collect) (Issue #4). Maybe we should push for "concrete tool" than "abstract notion".
Include “Exchange Specification” and remove “Protocol”: we need a specification for Exchange Tool, which can be output from GSBPM Phase 2 (Design) and be used to build Exchange Tool in Phase 3 (original purpose of this github issue :D). I removed “Protocol” because it seems now overlapping with “Exchange Specification” and “Exchange Tool” itself (but not sure..)
(Question) something for dissemination tool??: Questionnaire and Data Harvest are collection tools, and we do not have any concrete dissemination tool. Perhaps mini web-sites..? but this sounds too concrete..

FrancineK commented 2 years ago

Change the name “Exchange Channel” to “Exchange Tool” (or Instrument or Mechanism?): now that we are left with only “Questionnaire” and “Data Harvest” in “Exchange Channel”, we might need to update its definition and names accordingly. The current definitions of Questionnaire (“A concrete and usable tool to elicit information from observation Units”) and Data Harvest (“A concrete and usable tool to pass information between two sources, usually by a machine to machine mechanism”) indicate that they are more of concrete tool than abstract notion of exchanging information. Linking GSBPM-GSIM task team also raised an issue that there is no GSIM class that could represent a concrete application that we build in GSBPM Phase 3 (Build) and use in GSBPM Phase 4 (Collect) (Issue Missing GSIM class – concrete representation of Exchange Channel and Business Process #4). Maybe we should push for "concrete tool" than "abstract notion".

I agree with this, but I think we are still missing a third option for administrative data, not a register, but as a type of information brought into the organization from an external organization, a part from Data Harvest channels which include web scrapper, API, scanner, sensor, satellite, etc.

InKyungChoi commented 2 years ago

Updated version:

InKyungChoi commented 2 years ago

Suggestion for definitions of the added/changed classes:

Exchange Instrument

Definition: concrete and usable tool to exchange information

-> Now I am thinking whether it is easier if we merge Protocol and Exchange Instrument (if the latter is really for "concrete/usable tool, not an abstract notion")

Data Harvest

Definition: tool to pass information between two sources, usually by a machine-to-machine mechanism.
Explanatory text: Data Harvest can be used to collect data from administrative sources and scrap data from the web.

Questionnaire

Definition: tool to elicit information from observation Units
Explanatory text: This is an example of a way statistical organizations collect information. Each collection mode (e.g. in-person, CAPI, online Questionnaire) should be interpreted as a new Questionnaire. The Questionnaire is a tool in which data is obtained.

Exchange Specification

Definition: outline or description specifying the design of the Exchange Instrument

Information Structure

Definition : structure of an organized collection of information [using definitions from Data Structure and Referential Metadata Structure]

Dissemination Component / Instrument ?

Definition: tools to disseminate information
Explanatory text: Examples includes: (if we combine Protocol and Exchange Instrument) API or web services for data dissemination

JALinnerud commented 2 years ago

Updated version of the figure: Text on association from Output Specification to Presentation was "defines" in GSIM v1.2. Now it is 'uses' - incorrect?

InKyungChoi commented 2 years ago

@JALinnerud it was an intentional change, I thought if we are going to make Presentation independent from Product and be able to exist on its own without Product, Output Specification no longer "defines" Presentation, but rather "uses" (existing) Presentations

JALinnerud commented 2 years ago

Explanatory text for Data Harvester in GSIM v1.2 was " Examples of Data Harvest channels include web scraper, API, scanner, sensor, satellite, etc. " I think these were useful examples and hope we can keep them.

InKyungChoi commented 2 years ago

@JALinnerud I will add "Examples of Data Harvest channels include web scraper, API, scanner, sensor" in the explanatory text! I would exclude "satellite" though, as satellite is more than "data harvest" tool, and it is essentially sensors on satellite that capture the signal data

InKyungChoi commented 2 years ago

Here is the updated version

FlavioRizzolo commented 2 years ago

I think Register is kind of lonely up there. Even though they are not channels, they still are means of sharing information and need to be linked to Provision Agreement and some sort of "interface" specification, similarly to Product. Perhaps Exchange specification doesn't need to be linked only to Exchange Channel... it seems to me we are missing some generalization here.

Other than that, I think it works.

InKyungChoi commented 2 years ago

Updated version based on meeting notes #30 (relationship between Register and Statistical Support removed)

FlavioRizzolo commented 1 year ago

Ready to be modelled in EA

UNECE / GSIMRevision

Missing GSIM class – specification of Exchange Channel #5