edi3 / edi3-json-ld-ndr

GNU General Public License v3.0
0 stars 2 forks source link

Merging duplicate entites from Buy-Ship-Pay Reference Data Model #9

Closed Fak3 closed 3 years ago

Fak3 commented 3 years ago

Suggestion how we could handle duplicate entities from the Buy-Ship-Pay CEFACT Reference Data Model to be represented in edi3 vocabulary. In short, we link edi3 entities with old CEFACT entities using edi3:cefactID property.

BSP entities have Unique UN Assigned ID (in xls it is second column), we can use it to unambiguously link json-ld vocabulary term with BSP term.

For example of ReferencedConsignment and Consignment - remove these two classes:

{
  "@id": "edi3:Consignment",
  "edi3:cefactID": "UN01004159"
}

{
  "@id": "edi3:ReferencedConsignment",
  "edi3:cefactID": "UN01004040"
}

Add a new one with both UN ids linked to it:

{
  "@id": "edi3:Consignment",
  "edi3:cefactID": ["UN01004159", "UN01004040"]
}

And put a verbose description in the rdfs:comment:

{
  "rdfs:comment": "
    # Consignment

    ## Definition
    A separately identifiable collection of goods items to be transported or available
    to be transported from one consignor to one consignee in a supply chain via one or 
    more modes of transport where each consignment is the subject of one single transport 
    contract. 

    ## Mapping of Legacy CEFACT terms
    This Class should be used in place of the following Buy-Ship-Pay Reference
    Data Model entities:
    * UN01004159    ABIE    Supply Chain_ Consignment. Details
    * UN01004040    ABIE    Referenced_ Supply Chain_ Consignment. Details

  "
}

To aid software which translates from legacy CEFACT messages we would also maintain a mapping from old terms to the edi3 properties and classes, in form of json, csv and html page:

{
  "UN01004159": "edi3:Consignment",
  "UN01004040": "edi3:Consignment",
  "UN01004533": "edi3:PostalAddress",
  "UN01003173": "edi3:PostalAddress",
  // ...
}
Fak3 commented 3 years ago

CCL definitions published at https://www.unece.org/uncefact/codelistrecs.html also has some entities duplicating BSP RDM:

UN01002528 | ABIE | Cross-Border_ Consignment. Details | A separately identifiable collection of goods items to be transported cross-border from one consignor to one consignee via one or more modes of transport where each consignment is the subject of one single transport contract.

These ids could be mapped onto edi3 concepts in a similar way.

nissimsan commented 3 years ago

@Fak3, interesting idea. Your translation list ("UN01004159": "edi3:Consignment" etc) would function as part of the NDR documentation, how the transformation is to be done, and we can add all specific decisions we need to make along the way to this list. I think I like it! :) Also, with this I can see the point of https://github.com/edi3/edi3-json-ld-ndr/issues/8.

onthebreeze commented 3 years ago

Lets revisit the reason why there are these duplicates. It boils down to the CCTS methodology that says a usage of a core class (ie an ACC) in a business context must always be a qualified subset of that core class (ie an ABIE). So it is often the case that there are two or more usages of an ACC - such as the supplychain_consignment and referenced_consignment examples. there will be overlap of properties of these two classes because they MUST both be subsets of the referenced ACC. In the JSON-LD / semantic web world, it is disastrous and counter-intuitive to have the same property defined twice.

But I think we will run afoul of our colleagues back in UN/CEFACT if we just delete one class and aggregate all properties into the other. For whatever reason, it is important to some folks to know that this context uses a specific subset of properties whilst that context uses a different subset.

I think the best way to do this de-duplication of properties but STILL keep the context specific sub-setting information is as described in https://github.com/edi3/edi3-json-ld-ndr/issues/4#issuecomment-672394708 - ie like this

{
@id: "edi3:ConsignmentConsignmentItemQuantity",
@type: "rdfs:Property",
rdfs:range: "xsd:decimal",
rdfs:comment: "The number of consignment items separately defined for transport or customs purposes within this supply chain consignment.",
rdfs:domain: "edi3:Consignment",
rdfs:domain: "edi3:ReferencedConsignment",
rdfs:label: "Supply Chain_ Consignment. Consignment Item. Quantity",
edi3:cefactID: "UN01004196"
},

thoughts?

Fak3 commented 3 years ago

For whatever reason, it is important to some folks to know that this context uses a specific subset of properties whilst that context uses a different subset.

What do you mean by conext? Not sure i understood the requirement here. Could you please provide an example?

context has a specific meaning in the UN/CEFACT CCTS world. it basically means "context of use". So when I use "Address" ACC in the "context" of describing a financial institution address (ABIE) then it has different properties to the "context" of trade address. That's the theory. The specific example of these tow addresses shows precisely why the system is a bit broken. the address of a bank and the address of a consignee are really no different. the only reason there are two is conways law - namely that it's two different UN business domains doing the same thing without communication between them.

but, anyhow, these restrictions (ie which properties of the core class are used in this or that context specific class is something that is at the heart of the UN data modelling method and if we ignore it then we will put too many stakeholders offside. so the best thing we can do for the base JSON-LD vocabulary is to find a way to keep the subsetting information without duplicating properties. having multiple domain references for the same property seems the least damaging way of doing that?

Fak3 commented 3 years ago
{
@id: "edi3:ConsignmentConsignmentItemQuantity",
@type: "rdfs:Property",
rdfs:range: "xsd:decimal",
rdfs:comment: "The number of consignment items separately defined for transport or customs purposes within this supply chain consignment.",
rdfs:domain: "edi3:Consignment",
rdfs:domain: "edi3:ReferencedConsignment",
rdfs:label: "Supply Chain_ Consignment. Consignment Item. Quantity",
edi3:cefactID: "UN01004196"
},

If we keep both classes in edi3 vocab "edi3:Consignment" and "edi3:ReferencedConsignment" - then which one of them should be used in GET respone to the /consignment/{id} , and which one should be embedded into Certificate Of Origin?

whichever one the designer of the certificate of origin schema deems is most appropriate.

nissimsan commented 3 years ago

What I liked about @Fak3's suggestion here is that it would be an explicit way to get the ambiguities cleaned up. With your counter-suggestion, @onthebreeze, I would still be in doubt whether my consignment is a edi3:Consignment or edi3:ReferencedConsignment.

I would vote for @Fak3's suggestion here.

We could even make it even clearer, not only referencing the IDs but also ACCs - perhaps this could be a compomise:

{
  "@id": "edi3:Consignment",
  "edi3:cefactID": ["UN01004159", "UN01004040"],
  "edi3:cefactACC": ["Consignment", "ReferencesConsignment"]
}
onthebreeze commented 3 years ago

Ok but how will the consumer of the published vocabulary know which properties of consignment are used in referenced.consignment vs supplychain.consignment?

Fak3 commented 3 years ago

Some properties was not deduplicated properly. For ex. Consignment has these two properties, which should really be one:

IncludedReferencedSupplyChainConsignmentItem | edi3:ConsignmentItem IncludedSupplyChainConsignmentItem | edi3:ConsignmentItem

Fak3 commented 3 years ago

Some properties was not deduplicated properly. For ex. Consignment has these two properties, which should really be one:

IncludedReferencedSupplyChainConsignmentItem | edi3:ConsignmentItem IncludedSupplyChainConsignmentItem | edi3:ConsignmentItem

@kshychko Can you please have a look at this? Is it possible to deduplicate them automatically?

Fak3 commented 3 years ago

This task is done. Duplicate entities are merged in the vocabulary.