edi3 / edi3-json-ld-ndr

GNU General Public License v3.0
0 stars 2 forks source link

Physical (postal) address json(+ld) representation #7

Open Fak3 opened 4 years ago

Fak3 commented 4 years ago

There are many options of how a postal address or country may be encoded in json(+ld). This issue seeks to collect a set of requirements for such representations to be included in edi3 vocabulary and json requests\responses.

Interoperability concerns of country identifiers perhaps worth to be split into the issue of its own, but here I have conflated it with the postal address representations as they are related.

General recommendations and best practices for data on the web

For Entities and Properties used in the edi3 requests\responses we have choice:

Reuse concept from already existing vocabulary.

In the mappings defined by json-ld context, keep the full entity url namespaced under that reused vocab (ex: http://www.w3.org/2006/vcard/ns#Address). In the edi3 documentation leave a reference to the reused term vocabulary documentation.

Create new concept.

In the mappings defined by json-ld context, define the full entity url under our own-controlled namespace (https://edi3/...).

New concept could extend or specialize concepts from other already existing vocabularies i.e. be a subClassOf or subPropertyOf something.

Reusing or extending concepts could allow for data analysis by applications which already aware of semantics of those generalized vocabularies.

Prior art vocabularies

vCard:Address

{
  streetAddress: "1600 Amphitheatre Pkwy"
  locality: "Mountain View",
  region: "California",
  countryName: "United States",
  postalCode: "94043",
}

schema:PostalAddress

{
  addressCountry: "US",
  addressLocality: "Mountain View",
  addressRegion: "California",
  postOfficeBoxNumber: "123",
  postalCode: "94043",
  streetAddress: "1600 Amphitheatre Pkwy"
}

gs1:PostalAddress

Very similar to schema.org, but includes more fields.

{
  addressCountry: {
    countryCode: "US",
    countrySubdivisionCode: "USNYC"
  }
  addressLocality: "New York",
  addressRegion: "Name or ALA",
  countyCode: "County 789",  // County is not Country
  crossStreet: "ZXC street",
  organizationName: "Example Org",
  postOfficeBoxNumber: "123",
  postalCode: "456",
  streetAddress "ABC Alley "
}

Interoperability

Country identifiers

Will country codes (iso-3166 or other) suit the role of identifiers? Are there any UN recommendations on the interoperability of decentralized digital systems re identifying countries?

It would be nice to capture use cases of possible interop disruptions due to changing country codes\names.

Example iso-3166 country code changes

Could the following request\response disruption use cases arise?

Should we take additional concerns about interpretation of country codes in archival data given that codes may change in time?

Mapping of legacy UN/CEFACT Buy-Ship-Pay Data Model

In the BSP vocabulary there are few places where Physical Address is defined: Trade Address and Financial Institution Address. Both have over dozen sub-fields and mostly duplicating each other. For example UN01006206 Department Name and UN01004891 Department Name. See complete set of fields below.

The concern to directly reuse BSP vocabulary Address in edi3 is that its granularity and duplication seems to violate best pratices - it offers a level of detail that is superfluous for the data.

Granularity of sub-fields

Is it important to have such granular and rich set of fields for our use cases?

Line One, two, three, four, five - is it correct that those fields was directly transferred from some other physical or legacy standard\recommendations?

I would imagine that we could reduce the set of sub-fields to the very essential ones, like vCard:Address above. And probably we could come up with recommendations how to automatically map BSP fields into the reduced set. (concatenation rules, separators). I would love to hear the arguments against.

Duplication of sub-fileds

Is there any use case which would require both disctinct Trade Address and Financial Institution Address in the vocabulary?

My guess is that we don't have the limitations of BSP which lead the split of the two.

BSP Trade Address (UN01004533)

The location at which a particular trade related organization or person may be found or reached.

Coordinates:

BSP Financial Institution Address (UN01003173)

The location at which a financial institution may be found or reached.

nissimsan commented 4 years ago

@Fak3, absolutely excellent layout of the decision to be made on how we approach address. This has been discussed in UN Forums earlier, but only in the abstract. Your breaking down of pros and cons here is super helpful.

I agree with the best practice of reusing existing, standardized vocabularies. This means I think the correct decision is to go with one of the existing address definitions you list rather than introducing a CEFACT-specific address. Agruments for this include:

  1. I don't see anything special about a CEFACT address which justifies a new vocabulary.
  2. It will be much harder to consume a "proprietary" type of address. It goes against the whole idea of standardization.
  3. Lines 1, 2, 3, 4, 5 for sure does not carry special, individual meaning. It's clearly meant as some kind of catch-all, but does not belong in a schema.
  4. The fact that there's not even a common address in CEFACT (but two) in itself is a disqualifying fact IMO.

Now, whether going with schema.org, vcard or gs1 I find is a harder decision. I just don't know what to base that decision on. gs1 seems to go a bit to far, especially with organizationName - that doesn't belong in an address IMO. vcard and schema.org are basically identical in granularity (postBox is quite irrelevant). I think I like the attribute names of vcard a bit better. But again: this is entirely subjective (and likely uninformed on my part).

Your thoughts on ISO3166 for sure are relevant in a very small fraction of cases. I don't think we should make a problem out of this, going with ISO3166 is a safe choice IMO.

onthebreeze commented 4 years ago

Wow - a lot of questions here.

Fak3 commented 4 years ago

the fact that we must publish a CEFACT address does not preclude us from identifying a standard way to reference / include other vocabularies in any implementations. So this project should allow @nissimsan to choose whether, in his specific implementations, he wants to use the vcard address or the cefact address.

We can indeed leave edi3:address property with an unspecified rdfs:range. But to aid interoperability, we could say that use of vcard:Address is recommended.

"edi3:address property with an unspecified _rdfs:range" - why does the range need to be unspecified? address has properties like country, postcode, etc.. no?

Fak3 commented 4 years ago

why does the range need to be unspecified? address has properties like country, postcode, etc.. no?

if the range is not specified, then value of this property could be anything. This will allow to associate instance of vcard:Address with an organization:

{
@id: "example.org",
edi3:address: {
  @type: "vcard:Address",
  vcard:streetAddress: "1600 Amphitheatre Pkwy"
  vcard:locality: "Mountain View",
  vcard:region: "California",
  vcard:countryName: "United States",
  vcard:postalCode: "94043",
 }
}

Or to use any custom class from another vocabulary "myvocab:Address", or even be a simple string: edi3:address: "United States, California, 1600 Amphitheatre Pkwy"

But as this freedom hurts interop, I suggested to recommend vcard:Address in the human-readable documentation.

nissimsan commented 4 years ago

Was just looking at https://schema.org/version/latest/schemaorg-current-http.jsonld and I think we can perhaps draw some inspiration from there. Here's their Person definition:

    {
      "@id": "http://schema.org/Person",
      "@type": "rdfs:Class",
      "http://schema.org/source": {
        "@id": "http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_rNews"
      },
      "http://www.w3.org/2002/07/owl#equivalentClass": {
        "@id": "http://xmlns.com/foaf/0.1/Person"
      },
      "rdfs:comment": "A person (alive, dead, undead, or fictional).",
      "rdfs:label": "Person",
      "rdfs:subClassOf": {
        "@id": "http://schema.org/Thing"
      }
    },

Particularly I'm referring to equivalentClass. We could do this:

  {
      "@id": "http://edi3.org/Address",
      "@type": "rdfs:Class",
      "http://www.w3.org/2002/07/owl#equivalentClass": {
        "@id": "http://www.w3.org/2006/vcard/ns#Address",
        "@id": "https://schema.org/PostalAddress"
      },

I don't know how reasoners would make use of this. But the fact that schema.org does this I think is a strong indicator that there's support for this approach.

Fak3 commented 4 years ago

schema.org doing it wrong. They should have used rdfs:subClassOf instead of owl:equivalentClass, because they added more properties to the class.

nissimsan commented 4 years ago

Good, then, @Fak3. I'm more than happy not to get into any OWL'ing. Like I've said on other occasions, it's theoretical completeness seem to be at the expense of practical usefulness - you just choke on the complexity! Just ignore my suggestion.

Back to the original topic of this: an informal human-readable recommendation is pragmatic. I just really feel like we should have guts enough to take a step further. With your human-readable recommendation suggestion, we would need to expose the full cefact-alternative - which we recommend not using. That just makes no sense!

We seem to be down to one alternative now: vcard. I would vote for that rather than a cefact-specific address.

VladimirAlexiev commented 2 years ago

Two more for consideration: W3C LOCN and our EBG: https://github.com/w3c-ccg/traceability-vocab/issues/273