LinkedPasts / lp-network

Workspace for the emerging Linked Pasts Network
http://linkedpasts.org
6 stars 0 forks source link

Reification time #2

Closed kgeographer closed 6 years ago

kgeographer commented 6 years ago

An "edge property" issue

Over the past few months I have been trying to coordinate development of an update to the Pelagios Gazetteer Interconnection Format, calling it PGIFv2, on behalf of the World-Historical Gazetteer project (WHG). The original PGIF specifies - in RDF format - the required and optional properties of place data contributed to Pelagios for inclusion in its Peripleo index and search interface.

WHG is building a very similar index, but needs to account for time in a more granular way than PGIF allows -- hence the move towards PGIFv2. The plan is that Peripleo and WHG will both require contributions in PGIFv2 form, so the same data "dump" file of place records from a research group can be ingested by either or both.

Separately (i.e. prior to WHG's inception), I have been developing a temporal extension to GeoJSON, called GeoJSON-T. It proposes an optional "when" element added as what the GeoJSON spec calls a "foreign member" in a few places within a GeoJSON Feature.

The initial idea for PGIFv2 contributions was that they take the form of GeoJSON-T FeatureCollections. These would be valid GeoJSON, and as such readily mappable in web applications and automatically rendered as maps in GitHub.

But Linked Open Data is by definition RDF! Ah, no problem, we'll make it GeoJSON-LDT, and it will adhere to the JSON-LD spec! As Rob Sanderson has effectively argued in developing IIIF and Linked Art specifications, JSON-LD is more usable than other RDF syntaxes: aiming to be "as easy to use as possible without the need for a full RDF development suite." As a developer accustomed to parsing JSON data in web applications, I agree!

But things have gotten murky from there...

Firstly, JSON-LD 1.0 does not support lists of lists, so the "coordinates" elements of GeoJSON geometries that are more complex than single points (linestrings, polygons, etc.) will not validate as JSON-LD.

Secondly, the requirement to add temporal attributes of properties introduces the spectre of "edge properties," or reification. In PGIFv2 we need to be able to state a valid timespan or period for 1) a place name or geometry; and/or 2) a relation between two places (e.g. "partOf"). RDF is essentially triples, but we require quads. As far as I can tell, neither IIIF nor Linked Art make explicit recommendations for meeting that challenge with JSON-LD. I think the event-centeredness of Linked Art obviates the need in that case.

So, I backed off the idea of PGIFv2 as GeoJSON-LDT, but I am reminded (again) that LOD is about RDF, so circling back...

Toward answers

Lists of lists. I don't consider this a problem for PGIFv2 unless and until JSON-LD parsers are involved. My understanding is that the W3C Working Group now meeting to develop a JSON-LD updated spec (1.1) will design a workaround.

Edge properties. The JSON-LD spec recommends Named Graphs (@graph) as the way to "make statements about a graph itself, rather than just a single node." The example case is of a graph consisting of multiple statements; in our case @graph would almost always consist of a single statement.

An example describing partOf relations for Oxford and Abingdon is shown below. Is it too much to ask WHG and Peripleo contributors to organize their data this way, i.e. minting @graph IDs? Arguably it isn't tougher than generating PGIFv1 data dumps. The answer probably depends on the size of the contributing project; smaller groups and individuals will struggle to get from CSV to any RDF or JSON representation. But that problem extends to things like the (optionally) complex "when" element.

This suggests a CSV-to-LOD service - subject of a follow-up blog post very soon. The WHG team has come to see this as an essential deliverable for us in any case.

Ex 01 - Some partOf relations using @graph

{
  "@context": "http://linkedpasts.org/assets/place-v4-context.jsonld",
  "related": [
    { "@id": "http://mygaz.org/graphs/01",
      "@graph": {"@id": "myplace:Abingdon","partOf": "myplace:Berkshire"},
      "when": {
        "timespan":
          { "start": {
              "label": "17c.",
              "in": {"earliestYear": "1600","latestYear": "1700" }},
            "stop": {
              "label": "until 1974",
              "in": {"year": "1974"}}
          }
      }
    },
    { "@id": "http://mygaz.org/graphs/02",
      "@graph": {"@id": "myplace:Abingdon","partOf": "myplace:Oxfordshire"},
      "when": {
        "timespan":
          { "start": {
              "label": "from 1974.",
              "in": {"year": "1974"}},
            "stop": {
              "label": "until present",
              "in": {"year": "2018"}}
          }
      }
    },
    { "@id": "http://mygaz.org/graphs/03",
      "@graph": {"@id": "myplace:Oxford","partOf": "myplace:Oxfordshire"},
      "when": {
        "timespan":
          { "start": {
              "label": "from 1000",
              "in": {"year": "1000"}},
            "stop": {
              "label": "until present",
              "in": {"year": "2018"}}
          }
      }
    }
  ]
}

Furthermore...

The PGIFv2 format (if made LD-compatible) might also be called the Linked Places Model, corresponding to the Linked Art Model being developed at Getty, some future Linked People Model, and so on...a Linked Pasts suite of models one might say. These models/formats would be used by contributors to the growing number of LOD aggregator projects (like Pelagios and WHG for place, SNAP:DRGN and SNAC for people, etc.).

Please comment...

There are alternatives to the Named Graph approach to asserting edge properties in RDF, including "standard reification" of rdf:Statements and n-ary relations. Are these better? Something else? Please make the case -- the contribution pipeline phase of WHG development can't proceed until a (preferably consensus) decision is taken.

vajlex commented 6 years ago

Aloha Karl, notwithstanding that the use case of @graph looks fine to me, I am just chipping in to ask if the CSV to LOD service can include options for the output? so that serialization can be done into the JSON-LDT or GeoJSON-T, why not? Further, in addition to the use cases shown in your example (where the ability to cast a range for a single begin or end date is appreciated), is there another way to define spans using a taxonomy of named time periods (for example, Chinese reign periods), or a canonical time period URI found in perio.do? Just wondering.
I'm sure the teeming throngs of gazetteer modelers will quickly make this comment moot.

richardofsussex commented 6 years ago

Resorting to the use of @graph seems like an admission of failure to me: you can only use it for one thing, and I would prefer to hang on to it for more mainstream uses in the triple/quad stores we hope to build.

Instead I would prefer to model a subclass of the CIDOC CRM class E4 Period, which correlates a period and place, to represent a geopolitical entity with defined boundaries and temporal scope.

vajlex commented 6 years ago

So you mean this? crm:E2_Temporal_Entity crm:P82_at_some_time_within"1634"^^xsd:gYear . Example?

kgeographer commented 6 years ago

lol – I certainly feel like a failure on this so far :^)

The problem extends beyond temporal scope of a particular place or geometry. The requirements at this point include an optional temporal scope for three relations:

name (P87 is_identified_by), parthood (P106i forms_part_of), and spatial extent (P53 has_former_or_current_location).

This is readily achievable with GeoJSON-T, but there’s no RDF semantics in that case, and LOD is all about sharing RDF representations.

In RDF world, it calls for some method of reification, and in JSON-LD syntax the spec says that’s an @graph structure.

Apart from JSON-LD, this could be “standard reification” of an rdf:Statement (spo), n-ary relations (reifying the relation), or named graphs.

I see no way around choosing one of the reification methods. Richard, I take it you mean something like sub-classing Period for Naming, Parthood, and Extent events/periods or similar? n-ary reification by another name! Seems there must ultimately be a LinkedPasts Ontology, eh?

NB: not all places (geographic features) are geopolitical entities; also, triple/quad stores is not a particular goal of WHG or Pelagios (could happen for WHG, jury’s out)

richardofsussex commented 6 years ago

Karl,

The problem we are facing is the old 'property of a property' issue, which has been a problem for the CIDOC CRM from the day when it tried to move into the RDF space. I suppose that the logical consequence of what I am suggesting is that we could potentially define a place as 'Abingdon-from-17c-to-1974', and then have a simple partOf relation between that entity and 'Oxfordshire-from-X-to-Y'.

That finesses the need for more powerful relationships between geographical/geopolitical entities. It gives us a handy hook on which to hang the boundary of this particular Abingdon. It allows us to make statements about what geopolities (?) preceded and followed that particular entity.

The thing that strikes me about this case is how little support we get from the CIDOC CRM for the types of obvious statement we would want to make about this geographical entity, beyond asserting its name. I'm going to Lyon for the full CIDOC CRM meeting the week after next, and would be delighted to be an ambassador for any messages which the Linked Pasts community wants to send in their direction.

Richard

richardofsussex commented 6 years ago

Further to this, I have just discovered CRMgeo [1], and will attempt to get my brain around what it offers us and report back.

[1] http://www.cidoc-crm.org/crmgeo/home-5

richardofsussex commented 6 years ago

CRMgeo is an extension of the CIDOC CRM designed to support spatiotemporal data. It provides a set of classes and properties which link the CRM to the OGC standard GeoSPARQL: http://www.opengeospatial.org/standards/geosparql This is the full framework as a PDF: http://www.cidoc-crm.org/crmgeo/sites/default/files/CRMgeo1_2.pdf CRMgeo has the class SP6_Declarative_Place, which is a subclass of Geometry, E89_Propositional_Object and E53_Place. Essentially, this allows you to declare an SP6 and specify its name, geographical and temporal boundaries, etc. So your temporal aspect is 'baked into' the SP6 object: my 'Abingdon-from-17c-to-1974' example above. The CRM has simple containment properties which can express hierarchical relationships between instances of SP6. I don't see a way of declaring that a relationship lasted for a specified period (one of your requirements above); instead you would have to bake the periods into the SP6's and then make a simple relationship between them. You could see this as a sort of 'normalization' of the data. Would this work?

Richard

richardofsussex commented 6 years ago

As regards the actual coordinate data, the CIDOC CRM maintains its usual lofty distance from boring implementation details, but suggests that WKT and GML literals will be used. These look to me like strings with a specified format representing a number of points. How does that tie in with your thinking, Karl?

kgeographer commented 6 years ago

Richard,

Thanks for weighing in. I/we could readily create a Place ontology in RDFS/OWL, but I haven’t quite given up on a format that is both GeoJSON-compatible and valid RDF. I have one more tack to try before abandoning the GeoJSON compatibility. It would be ideal if contributions to Pelagios and WHG can be directly rendered to a map.

I’m finding CIDOC clouds the issue for me at this moment, and CRMgeo is a thunderhead! That is, incredibly complex. Not too complex for me to understand, but far too complex for me to ask WHG contributors to deal with.

One core issue for me has always been CIDOC’s scope note for Place – it does not align with the definition and conceptual model of that term now common to Pleiades, Pelagios, WHG. Rather it is what we consider a Location. I’m prone to finding uses for CIDOC (I like the Linked Art model Rob Sanderson is developing at Getty), but for me it’s “horses for courses.”

richardofsussex commented 6 years ago

OK, what I will do is to take your example, and try to express the assertions it makes in a more CIDOC CRM-compatible manner. I will attempt to use CRMgeo. If I succeed in doing this, you will have a concrete suggestion for an RDF syntax we could use, to compare with other approaches. My own view of the CRM is that it looks complex because of all the possibilities it offers, but that specific instances aren't too bad. One should also note that RDF, properly done, is always more complex than the simplistic s - p - o assertions you get in (e.g.) dbpedia.

Interesting point about Place vs. Location: I would agree that the CRM Place (as a space-time volume) is very much about the geometry, and not at all about the human occupation/social construct aspect of a Place. That suggests to me that maybe a new CRM concept is required. It's a thought I would be happy to take to the CRM SIG meeting next week.

richardofsussex commented 6 years ago

I don't see Location in the LAWD ontology: can you give me a link? Also, the definition of Place ("Any conceivable place, such as a town, a mountain, or the site of a building.") isn't that specific.

kgeographer commented 6 years ago

The LAWD ontology is here: https://github.com/lawdi/LAWD, and it has where, foundAt, and origin as predicates with a range of Place. Yes, their Place scope note is a tautology. In CRM land, I think E27_Site corresponds most closely to our working conception, so in a Linked Places ontology (were there to ever be such a thing), I would say lp:Place skos:closeMatch crm:E27_Site.

I am not beholden to the LAWD ontology; in fact I am deeply skeptical of using a hodgepodge of ontologies. It seems to be common practice, but relies on the fact no one will ever actually load this data with all referenced ontologies into a triple store and try to reason with it. The alternative is to create an wholly intact application ontology and use owl:equivalentClass or skos:exactMatch here and there, if you really mean it.

kgeographer commented 6 years ago

@richardofsussex I don't know what example you are looking at. The following is my latest stab at a JSON-LD rendering of PGIF for Athens, followed by the referenced @context. I just added them to the repo

{
  "type": "FeatureCollection",
  "@context": "http://linkedpasts.org/assets/pgif-context_v5.jsonld",
  "features": [
    {
      "type": "Feature",
      "id": "http://www.mygaz.org/places/pl_12345",
      "identifier": "pl_12345",
      "properties": {
        "title": "Athens",
        "ccode": "GR"
      },
      "geometry": {
        "type": "GeometryCollection",
        "geometries": [
          {
            "@id": "http://mygaz.org/graphs/geo/01",
            "@graph": {
              "@id": "http://www.mygaz.org/places/pl_12345",
              "geom": "POINT(23.7275 37.9838)^^<geo:wktLiteral>"
            },
            "when": {"timespans": {"start": {"in": "-1200"}}},
            "certainty": "certain",
            "type": "Point",
            "coordinates": [23.7275,37.9838]
          }
        ]
      },
      "descriptions": [
        {
          "value": "A major Greek city-state",
          "lang": "en"
        }
      ],
      "when": {
        "timespans": [
          {
            "start": {
              "label": "about 750 BCE",
              "in": {
                "earliest": "-775",
                "latest": "-725"
              }
            },
            "end": {
              "label": "640 CE",
              "in": {
                "year": "640"
              }
            }
          }
        ],
        "periods": [
          {
            "label": "Classical",
            "@id": "periodo:p03wskd389m"
          }
        ]
      },
      "namings": [
        {
          "@id": "http://mygaz.org/graphs/names/01",
          "when": {},
          "lang": "el",
          "attestation": {
            "publisher": "http://www.mygaz.org",
            "evidence": "http://www.mygaz.org/documents/01234"
          },
          "@graph": {
            "@id": "http://www.mygaz.org/places/pl_12345",
            "label": "Αθήνα",
          }
        }
      ],
      "place_types": [
        {
          "@id": "aat:300008347",
          "label": "inhabited place"
        }
      ],
      "parthood": [
        {
          "@id": "http://mygaz.org/graphs/part/01",
          "when": {
            "timespans": [
              {
                "start": {
                  "label": "about 750 BCE",
                  "in": {"earliest": "-775", "latest": "-725"}
                },
                "end": {
                  "label": "640 CE",
                  "in": {"year": "640"}
                }
              }
            ]
          },
          "@graph": {
            "@id": "http://www.mygaz.org/places/pl_12345",
            "part_of": "http://www.mygaz.org/places/pl_012"
          }
        }
      ],
      "depictions": [
        {
          "@id": "http://www.ex.com/images/parthenon_001.jpg",
          "title": "The Parthenon",
          "license": "cc:by-sa/3.0/"
        }
      ],
      "relations": [
        {"exact_match": "http://pleiades.stoa.org/places/579885" },
        {"close_match": "http://sws.geonames.org/264371/" },
        {"subject_of": "https://en.wikipedia.org/wiki/Athens" },
        {"see_also": "https://en.wikipedia.org/wiki/Ancient_Greece" },
        {"primary_topic_of": "https://en.wikipedia.org/wiki/Athens" }
      ]
    }
  ]
}
{"@context": {
  "lpo": "http://linkedpasts.org/ontology#",
  "aat": "http://vocab.getty.edu/aat/",
  "cc": "http://creativecommons.org/licenses/",
  "characters": "http://www.w3.org/2011/content#chars",
  "cito": "http://purl.org/spar/cito#",
  "crm": "http://erlangen-crm.org/current/",
  "dct": "http://purl.org/dc/terms/",
  "foaf": "http://xmlns.com/foaf/0.1/",
  "gn": "http://www.geonames.org/ontology#",
  "gvp": "http://vocab.getty.edu/ontology#",
  "lawd": "http://lawd.info/ontology/",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "skos": "http://www.w3.org/2004/02/skos/core#",
  "w3cgeo": "http://www.w3.org/2003/01/geo/wgs84_pos#",
  "geo":"http://www.opengis.net/ont/geosparql#",
  "periodo": "http://n2t.net/ark:/99152/#",

  "geojson": "https://purl.org/geojson/vocab#",
  "Feature": "geojson:Feature",
  "FeatureCollection": "geojson:FeatureCollection",
  "GeometryCollection": "geojson:GeometryCollection",
  "LineString": "geojson:LineString",
  "MultiLineString": "geojson:MultiLineString",
  "MultiPoint": "geojson:MultiPoint",
  "MultiPolygon": "geojson:MultiPolygon",
  "Point": "geojson:Point",
  "Polygon": "geojson:Polygon",
  "bbox": { "@container": "@list", "@id": "geojson:bbox" },
  "coordinates": { "@container": "@list", "@id": "geojson:coordinates" },
  "features": { "@container": "@set",  "@id": "geojson:features" },
  "geometry": "geojson:geometry",
  "properties": "geojson:properties",
  "when": {"@id": "lpo:when"},
  "timespans": { "@container": "@set",  "@id": "lpo:timespans" },
  "periods": { "@container": "@set",  "@id": "lpo:periods" },
  "geometries": { "@container": "@set", "@id":"lpo:geometries"},

  "start": "time:intervalStartedBy",
  "end": "time:intervalFinishedBy",
  "in": {
    "@type": "time:generalYear",
    "@id": "lp:in"
  },
  "earliest": {
    "@type": "time:DateTimeDescription",
    "@id": "lp:earliest"
  },
  "latest": {
    "@type": "time:DateTimeDescription",
    "@id": "lp:latest"
  },

  "id:": "@id",
  "type": "@type",
  "value": {"@id": "rdf:value"},
  "identifier": "dct:identifier",
  "label": {"@id": "rdfs:label"},
  "title": {"@id": "dct:title"},
  "ccode": {"@id": "gn:countryCode"},
  "license": "dct:license",

  "attestation": {"@id": "lawd:hasAttestation" },
  "publisher": {"@id": "dct:Publisher"},
  "evidence": {"@id": "cito:citesAsEvidence"},

  "descriptions": {
    "@id": "dct:description",
    "@type": "@id",
    "@container": "@set"
  },

  "depictions": {
    "@id": "foaf:depiction",
    "@type": "@id",
    "@container": "@set"
  },

  "placetypes": {
    "@id": "gvp:placeTypePreferred",
    "@type": "@id",
    "@container": "@set"
  },

  "namings": {
    "@id": "lpo:NameAttest",
    "@container": "@set"
  },
  "toponym": "lpo:Toponym",
  "lang": "dct:language",

  "parthood": {
    "@id": "lpo:PartAttest",
    "@container": "@set"
  },
  "part_of": {
    "@id": "gvp:broaderPartitive", 
    "@type": "@id" },

  "related": {
    "@id": "dct:relation",
    "@type": "@id",
    "@container": "@set"
  },
  "primary_topic_of": {"@id": "foaf:isPrimaryTopicOf", "@type":"@id" },
  "subject_of": {"@id": "crm:P129i_is_subject_of", "@type": "@id"},
  "close_match": {"@id":"skos:closeMatch", "@type":"@id"},
  "exact_match": {"@id":"skos:exactMatch", "@type":"@id"},
  "see_also": {"@id":"rdfs:seeAlso", "@type":"@id"}
}}
richardofsussex commented 6 years ago

Do you see any difference between physical 'happens to fall within' and administrative 'administered by' relationships between places? I've just fired off an email to the CIDOC CRM SIG, pointing out that it doesn't cover administrative places (i.e. 'the place run by XX Town Council') at all. Is this an issue for you, or not? If you invent 'an administration' as a thing, you can assign date and place information to it, and record its relationships to other administrative units (both containment and succession).

Your examples suggest that you want to be able to do this.

kgeographer commented 6 years ago

Yes. The partOf relation, in PGIFv1 and going forward to v2 is not spatial. Although the semantics are not spelled out well in either (yet), partOf is intended to encompass what you might call "administrative parent," allowing for computing such temporally scoped hierarchies; cf. the Abingdon/Berkshire/Oxfordshire example in my March 12 comment in Issue #1

vajlex commented 6 years ago

FWIW, I agree that PartOF is not spatial, but about jurisdiction, as Karl said "administrative parent." The main issue to watch out for is fragmentation of the hierarchies, since the parents change over time. So there are many turning points (branches) when traveling UP the tree of child-to-parent relations, if you include the temporal changes of admin units. Which is counter-intuitive, as you expect branches to extend downward from a national capital. It still ends up as a pyramid of sorts, but there will be nodes that represent mid-level changes where you wouldn't expect them. hierarchy_of_places_in_admin_realm In most scenarios, it is assumed that the changes in the prefecture (as shown in the image) are just attributes of one entity in the database. But what if the prefecture changes it's name? Is it the same prefecture?
In CHGIS we thought of keeping areal units as entities, and allowing changing attributes (like) names, but the areas change. On the other hand, you can keep placenames as entities, but the names change! So we ended up with the idea of historical instance of a place, which combines the placename + feature type + spatial object + timestamp. Then those could be related to each other, either hierachically (vertically), or across time in sequences (horizontally), as needed.
And you can introduce new information or changes as you discover them. The only changes in the database that need to be made are the direct relationships impacted by the new instance. The rest of the entire database and it's hierarchy not impacted. Any other attempt to make the hieararchical model didn't seem to capture the changes we saw in the data. This being said, I think Karl's proposal makes sense, in that you can define a "place entity" then loosely attached the changing placenames and jurisdictions that were associated with it over time. To define this, (as I mentioned to Karl in an email), I would propose: "the Place Entity is an accumulation of historical information / attributes that correspond to one another by overlapping in geographic space, and either preceding, following, or sharing that space over time."

richardofsussex commented 6 years ago

Lex,

Your description of an administrative place as 'placename + feature type + spatial object + timestamp' is pretty much exactly what I came up with as an idea. It's a type of normalization, which allows you to make unambiguous statements about the place. Obviously it brings the overhead (as any normalization does) of having to create a new record each time one of those parameter changes. OTOH, it is a useful piece of historical data in its own right to know that the entity existed unchanged for a specified length of time.

Also, there is no reason why you couldn't finesse the recording of changes, instead of having a whole new entity every time something changed. For example, you could have a 'hasMember' property which allows you to list all the sub-units within a given unit. Then you could have a 'joined' event which allows you to record, for example, one state joining the Union (and a corresponding 'left' event).

vajlex commented 6 years ago

@richardofsussex Indeed, when working with bits and pieces of historical information, and assembling them into a large dynamic expression of the change over time, using the instances as building blocks was an expedient way to handle it. Currently, I think Karl's issue has to do with the uncertainty of dates and locations. So he can't instantiate all the needed facets for the spatio-temporal object. Also, I think it would be rather hard to know how to create the hasMember properties for the types of changes that are found (in the CHGIS, at least), because you have various changes, including placename, as well as splits, merges, abolishments, and re-establishments. So when a place gets abolished, replaced by an unrelated entity, then re-established later, it is hard to define an object that hasMember of those discontinuous bits.
Therefore, I am hoping Karl (and his gang of ninjas) oh, is that us? can figure out a "loose" method to create "place entities" then hang bits from them in a meaningful way. Since most data won't fit the parameters that I mentioned, I just wanted to point out that the hierarchical structure has a slipping point when the middle-tiers morph into n phases or instances, and when relationships then need to be instantiated for each phase. It is a headache to keep track of these. Hoping for an easy out!

richardofsussex commented 6 years ago

I don't think it is necessary to include temporal information where you don't have it. Conversely, I agree that we haven't yet invented a specific structure for recording time-bounded relationships. Interestingly, the notes from the last CIDOC CRM SIG meeting say:

The sig accepted MD’s proposal that the temporality of relationships appears to be a separate topic with a set of distinct ontological patterns, which need to be considered separately. Depending on the pattern, it should be decided into which module an explicit description of a temporal validity of a relationship will belong, regardless of the "time agnostic" CRMbase versions.

I will be attending the CRM SIG meeting next week, and hope to make progress on these ideas.

kgeographer commented 6 years ago

Agreed, temporal scoping of names, locations and partOf relations is optional. Also agreed that PGIF intends to be a relatively "'loose' method to create 'place entities' then hang bits from them in a meaningful way." Both WHG and Pelagios are agnostic as to the data models of contributors to our "union indexes." We have to be - there are too many variations in conceptual and logical models, and in parent projects' purposes.

For example, CHGIS, which gave rise to Lex's TGAZ, is a historical GIS, and because its data includes a lot of jurisdictional changes, the TGAZ data model suits that to a T (so to speak). The Cultures of Knowledge project is also intent on recording temporally scoped partOf relations, but the great majority of WHG and Peripleo contributors will not. All that said, PGIFv2 will allow for representing those changes, but in as simple a way as will be broadly effective. Roughly as detailed in the example above.

I agree that a place entity comprising 'placename + feature type + spatial object + timestamp' can be useful and is probably "correct." And WHG will permit queries to return DISTINCT ON() those attributes. But in point of fact, the temporal attributes across the entire index/dataset will be so sparse, results will not be especially useful.

In TGAZ, there are 8 records for a Sishui Xian, a county at approx (113.19, 34.85). Its changing admin relations are tracked over time. I think WHG should provide a single answer to a query for that place, with a link to TGAZ/CHGIS, where people can explore its administrative history, learn all relevant sources, etc.

The fact that for CIDOC-CRM

"the temporality of relationships appears to be a separate topic with a set of distinct ontological patterns"

after all this time and the introduction several years ago of Spacetime volumes, is instructive of...something. And indicates it is not crazy that we are struggling over this set of issues right now. For 20 + years the issues of representing time and spatial dynamics has been a major research agenda topic in my field, Geographic Information Science, and there has always been widespread dissatisfaction inside and outside the field with "how GIS handles time." Hence GeoJSON-T, reflected in this draft PGIF spec, and its predecessor Topotime.

richardofsussex commented 6 years ago

When I raised this topic on the CIDOC CRM list, one Franco Nicolucci sent me a paper he has written where he discusses precisely the potential implementation of historical gazetteers for Pelagios using CRMgeo. However, his analysis doesn't give us a magic answer: the main point he makes is that you can quantize/'pixellate' the space-time volume and thereby simplify processing. In order to get a projection (e.g. the boundary of a feature on a given date) he resorts to 'properties of properties', which is where we came in. As you have said, we need to reify this, i.e. promote what is currently a property into a class.

kgeographer commented 6 years ago

Comments from Rob Sanderson (Getty Research; JSON-LD working group):

This isn't really a JSON-LD problem, but a baseline restriction of RDF that relationship instances cannot themselves have properties. In CIDOC-CRM these are the "dot" properties. Given that, there are three possible options each with different pros and cons:

  1. New properties. Cheap when the set of values is enumerable and small, but unable to express most use cases.
  2. Reify to a class instance. More expensive but at least in the data. This is the CRM-PC approach, and now more easily expressed in many cases using AttributeAssignment with a P2 of the predicate.
  3. Named Graphs. More expressive again, but more expensive and requires a quadstore, and using your ace-in-the-sleeve for this purpose (a triple can only be part of one named graph)

We're going with option 2, which requires no changes to JSON-LD to support nor the underlying technology, just a slightly more complicated data model.

richardofsussex commented 6 years ago

Snap! Looks as though I need to familiarise myself with AttributeAssignment as well as the other bits of the CRM which I have just been discovering ...

kgeographer commented 6 years ago

My own knowledge of CRM is dated; had no idea there is a CRM-PC. Reifying relations as classes is a well-known pattern, typical in relational database schemas as well as the ontologies people have mentioned. The concern I had was that the JSON-LD spec speaks only of Named Graphs (@graph) and I wanted assurance that we weren't running afoul of that standard somehow.

This blog post has been making the rounds: Reification is a Red Herring. I think reification takes many forms, not only the rdfs:Statement <s,p,o> and named graph that everyone agrees are troublesome somehow.

I'm going to put together an example for PGIFv2, for approval (hopefully) by this august group! Some of the aliases in its @context will point to an lpo: namespace, so there's a bit of RDFS or OWL to write supporting that.

Important to stress again that Pelagios and WHG are agnostic about individual project models for place data; we're after a valid RDF contribution format that allows us as aggregators to link places and expose a useful subset of those projects' data, linked back to the detail (and further links) they provide.

Oh, and although it saddens me, PGIFv2 will not itself be valid GeoJSON (JSON-LD does not support the lists of lists it uses for geometry), but will follow the form sufficiently that making it GeoJSON will be an option. A separate discussion to follow before long.

kgeographer commented 6 years ago

Slight change of direction...Rainer and I agreed to see what a prov:specializationOf implementation looks like in this context. This reifies a place-at-time as a "Setting," which a place can be partOf. This departs from the reification of a property Rob S. referred to, and like all other approaches has its pluses and minuses.

The below example and its @context can be examined more readily at JSON-LD playground. It has the advantage of accounting for the sort of HGIS implementations of TGAZ and EMPlaces, but those projects are not typical.

{
  "type": "FeatureCollection",
  "@context": "http://linkedpasts.org/assets/pgif-context_v6a.jsonld",
  "features": [
    { "@id": "mygaz:places/p_12345",
      "type": "Feature",
      "namings": [{"topymon":"Abingdon", "lang":"en"}],
      "properties":{"title":"Abingdon (UK)"},
      "geometry": {
        "type": "GeometryCollection",
        "geometries": [
          { "type": "Point",
            "coordinates": [-1.2879,51.6708],
            "certainty": "certain",
            "when": {
              "timespans":[{"start":"1600"}]
            }
          }
        ]
      },
      "part_of": [
        { "label": "Berkshire (1600-1774)",
          "@id": "mygaz:p_9876.1"
        },
        { "label": "Oxfordshire (1775-)",
          "@id": "mygaz:p_3456.1"
        }
      ]
    },
    { "@id": "mygaz:places/p_9876",
      "type": "Feature",
      "namings": [{"toponym":"Berkshire","lang":"en"}],
      "properties":{"title":"Berkshire (UK)"},
      "geometry": {
        "type": "GeometryCollection",
        "geometries": [
          { "type": "Point",
            "coordinates": [-1,51.4166]
          }
        ]
      }
    },
    { "@id": "mygaz:places/p_3456",
      "type": "Feature",
      "namings": [{"toponym":"Oxfordshire","lang":"en"}],
      "properties":{"title":"Oxfordshire"},
      "geometry": {
        "type": "GeometryCollection",
        "geometries": [
          { "type": "Point",
            "coordinates": [-1.28,51.75]
          }
        ]
      }
    }
  ],
  "settings": [
    {
      "@id": "mygaz:settings/s_9876.1",
      "type": "Setting",
      "properties": {
        "label": "Berkshire (1600-1774)"
      },
      "names": [
        { "label":"Berkshire (1600-1774)", 
          "lang":"en"}],
      "prov:specializationOf": "mygaz:p_9876",
      "geo_wkt":"POLYGON((30 10,40 40,20 40,10 20,30 10))",
      "when": {
        "timespans":[{"start":"1600","end":"1774"}]
      }
    },
    {
      "@id": "mygaz:settings/s_3456.1",
      "type": "Setting",
      "properties": {
        "label": "Oxfordshire (1775-)"
      },
      "names": [
        { "label":"Oxfordshire (1775-)", 
          "lang":"en"}],
      "prov:specializationOf": "mygaz:p_3456",
      "geo_wkt":"POLYGON((30 10,40 40,20 40,10 20,30 10))",
      "when": {
        "timespans":[{"start":"1775"}]
      }
    }
  ]
}
kgeographer commented 6 years ago

I think this is ordinary RDF/S for property reification for simple abbreviated records. Next step is getting this in JSON-LD with properly articulated "when" properties.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix lpo: <http://linkedplaces.org/ontology#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix mygaz: <http://mygaz.org/> .

lpo:Place a rdfs:Class .
lpo:Parthood  a rdfs:Class . 
lpo:child rdfs:domain lpo:Parthood .
lpo:parent rdfs:domain lpo:Parthood .
lpo:start rdfs:domain lpo:Parthood .
lpo:end rdfs:domain lpo:Parthood .

<!-- similar to lpo:Parthood -->
lpo:Naming a rdfs:Class .
lpo:Location a rdfs:Class .

mygaz:places/pl_12345 a lpo:Place ;
  dct:title "Abingdon (UK)" ;
  lpo:placeType lpo:InhabitedPlace .

mygaz:places/p_9876 a lpo:Place ;
  dct:title "Berkshire (UK)" ;
  lpo:placeType "lpo:County" .

mygaz:places/p_5678 a lpo:Place ;
  dct:title "Oxford (UK)" ;
  lpo:placeType "lpo:InhabitedPlace" .

mygaz:places/p_3456 a lpo:Place ;
  dct:title "Oxfordshire (UK)" ;
  lpo:placeType "lpo:County" .

[] a lpo:Parthood ;
  rdfs:label "Abingdon>Berkshire";
  lpo:child mygaz:places/pl_12345 ;
  lpo:parent mygaz:places/p_9876 ;
  lpo:start "1600" ;
  lpo:end "1974" .

[] a lpo:Parthood ;
  rdfs:label "Abingdon>Oxfordshire";
  lpo:child mygaz:places/pl_12345 ;
  lpo:parent mygaz:places/p_3456 ;
  lpo:start "1975" .

[] a lpo:Parthood ;
  rdfs:label "Oxford>Oxfordshire";
  lpo:child mygaz:places/p_5678 ;
  lpo:parent mygaz:places/p_3456 ;
  lpo:start "1600" ;
  lpo:end "1974" .
kgeographer commented 6 years ago

At long last, valid JSON-LD (playground), valid RDF by examination, valid GeoJSON (gist), with one proviso so far [1]. Seems intuitive and user-friendly, meets reqs. as I understand them. Hope this meets approval. For brevity, didn't articulate the properties we're not temporally scoping.

{
  "type": "FeatureCollection",
  "@context": "http://linkedpasts.org/assets/pgif-context_v6a.jsonld",
  "features": [
    { "@id": "mygaz:places/p_12345",
      "type": "Feature",
      "properties":{
        "title": "Abingdon (UK)",
        "ccode": "GB"
      },
      "namings": [
        { "toponym":"Abingdon", "lang":"en",
          "attestation": {
            "publisher": "http://pub.org/",
            "evidence": "http://pub.org/pubs/321/"
          },
          "when": {"timespans":[{"start":"1600"}]}
        },
        { "toponym":"Abingdon-on-Thames", "lang":"en",
          "when": {"timespans":[{"start":"1600"}]}
        }
      ],
      "parthood": [
        { "parent": "mygaz:places/p_9876",
          "parentLabel": "Berkshire (UK)",
          "when": {"timespans":[{"start":"1600","end":"1974"}]}
        },
        { "parent": "mygaz:places/p_3456",
          "parentLabel": "Oxfordshire (UK)",
          "when": {"timespans":[{"start":"1974"}]}
        }
      ],
      "geometry": {
        "type": "GeometryCollection",
        "geometries": [
          { "type": "Point",
            "coordinates": [-1.2879,51.6708],
            "geo_wkt": "POINT(-1.2879 51.6708)",
            "when": {"timespans":[{"start":"1600","end":"1699"}]}
          },
          { "type": "Point",
            "coordinates": [-1.30,51.68],
            "geo_wkt": "POINT(-1.30 51.68)",
            "when": {"timespans":[{"start":"1700"}]}
          }
        ]
      },
      "placetypes": [{}],
      "descriptions": [{}],
      "depictions": [{}],
      "related": [{}],
      "when": {}
    }
  ]
}

[1] The sample has only Point geometries; other types will not validate as JSON-LD due to its non-support of lists of lists. So practically, if someone has only Points and wishes their export to be immediately mappable, they can use this template. If they want true JSON-LD compatibility, they can omit the GeoJSON coordinates and include WKT instead.

kgeographer commented 6 years ago

did NOT mean to close this! clicked wrong button

kgeographer commented 6 years ago

The first product of this discussion, a draft LPIF, has been placed in its own repo, 'lpif'. Closing this issue, but further discussion can continue as issues in its new location