CLARIAH / wp2-GISLOD

Exchange for issues and code related to transposition of gemeentegeschiedenis.nl-data to LOD
MIT License
0 stars 0 forks source link

Encode the definition of a municipality into the dataset #6

Closed wouterbeek closed 7 years ago

wouterbeek commented 7 years ago

Should we model municipalities or should we model spatio-temporal slices?

Let's define a spatio-temporal slice as the combination of a geometry, a begin date and an end date.

Some spatio-temporal slides share the same (1) name, (2) Gemeentegeschiedenis URI, (3) CBS code and/or (4) Amsterdamse code. (1) through (4) seem to be 4 different attempts of defining what a municipality is. The 4 approaches share that they believe a municipality to consist of one or more spatio-temporal slices (but they disagree on which spatio-temporal slices constitute a municipality).

My question is now: is the notion of a municipality clear enough for us to model it?

I would like to propose that maybe we should not model what a municipality is. Instead, we would model spatio-temporal slices (something that can be defined precisely). Spatio-temporal slices can of course have properties, like a name, a Gemeentegeschiedenis URI, a CBS code, and an Amsterdamse code. (They can also have a French department and a province as property.)

The user of the dataset can choose for themselves what a municipality is. Examples include:

  1. Give me all spatio-temporal slices with the same name.
  2. Give me all spatio-temporal slices with the same name under minor spelling variations.
  3. Give me all spatio-temporal slices that contain a specific point (specified as a longitude/latitude pair).
  4. Give me all spatio-temporal slices with the same CBS code.
  5. Give me all spatio-temporal slices with the same Amsterdamse code.

The benefit to the here proposed approach is that we do not have to go into the philosophical discussion of what a municipality is, when a municipality stays the same, and when a municipality ceases to exist and/or becomes a new municipality. Specifically, we do not have to decide whether a municipality ceases to exist when it's name changes (slightly or significantly) or when its geometry changes (slightly or significantly). Also, we do not have to decide whether or not municipalities A and B cease to exist when they are merged together into municipality C (where C can have the same name as the larger municipality from which it originated, say A). All we say is that after spatio-temporal slices A and B cease to exist a spatio-temporal slice C originates. A and C may have the same value for their name property. They may have different values for their geometry property. They may have the same value for their Amsterdamse code property. They may different values for their CBS code properties. The user decides whether A and C are the same municipality, i.e., we do not make that decision in our dataset.

Edit: Since (1) conceptualizations of a municipality are given (e.g., gemeentegeschiedenis, Amco, CBS), and (2) geometries cannot be used as identifiers -- because two measurements of the same thing may not be the same -- we should model both municipalities and geometries.

mmmenno commented 7 years ago

I think municipalities are considered to be administrative bodies by most users, and its exact geometry is less relevant.

Practically, probably not all minor border-changes have found their way into the data, so we might expect new 19th-century geometries, for example. Also, we have different geometries for the exact same borders, coming from different sources that didn't drew these geometries with the same level of precision.

ivozandhuis commented 7 years ago

I don't know which problem we're solving by abstracting. Dutch Municipalities are as clear as can be: as of 1815 it is defined by law, everybody in the real world knows what it is, it is used in sources, it defines archives etc. etc. For that reason I guess we must define what a municipality is. The various standardizations try to incorporate time in various ways, sometimes even with different purposes. Those are different ways of modeling the same concept, in stead of different definitions of municipality. Or is this just semantics?

Apparently we did not miss the abstraction you propose, but maybe we've overlooked it. So one municipality can be modeled by different spatio-temporal slices? Or is a municipality a specialization of a spatio-temporal slice?

Extra concern: do we need to distract general users (historians) with parts of this discussion or can it stay in the background? The simplicity of one uri for one municipality in time is one of the big strengths, I think.

wouterbeek commented 7 years ago

Thanks for the feedback. I do not doubt that municipalities are use in law, everyday life, etc. The problem that I want to solve is that the definition of a municipality is unknown, at least to me. It would be great if we could include this definition into the dataset itself. Moreover, if we have the definition we can check whether all instances in the data follow this definition (quality control step).

PS: I've changed the topic of this issue to reflect the adjusted aim.

ivozandhuis commented 7 years ago

It keeps growing in my head. Is your Time/Space-Slice the smallest unit that helps us define "gemeente" and "amco" and "name(uri)"? modelingtimespace

wouterbeek commented 7 years ago

@ivozandhuis Yes, your picture shows the situation how I initially understood it! My reasoning was that if the time/space slices are a representation of physical reality (plots of land + timespan), we can build a conceptual layer (based on naming conventions, law, etc) on top of that.

E.g., assuming we have the following space/time slices:

  1. polygon A from 1900 till 1910
  2. polygon B from 1905 till 1910
  3. multi-polygon A+B from 1910 till 1920
  4. polygon A from 1920 till 1925
  5. polygon B from 1920 till 1930

We can come up with the following conceptualizations:

There may be many more conceptualizations, depending on the use case. Three conceptualizations are already in wide use (gemeentegeschiedenis, Amco and CBS). I'm not sure whether these three conceptualization should be properties of the same resource or independent resources. IIUC they are not exactly the same, so we cannot model them as owl:sameAs of one another.

rlzijdeman commented 7 years ago

Do you mean you define the smallest areas we know off (kadaster areas) and then combine these into whatever ppl could a municipality? (which might be time-sensitive?)

-- Dr. R.L. Zijdeman Chief Data Officer / Senior Onderzoeker Chief Data Officer / Senior Researcher

international institute of social history

Postbus (PO Box) 2169 NL – 1000 CD Amsterdam

Cruquiusweg 31 NL – 1019 AT Amsterdam

T + 31(0)20 668 5866 iisg.amsterdam datalegend.nethttp://datalegend.net

On 15 Feb 2017, at 08:22, Wouter Beek notifications@github.com<mailto:notifications@github.com> wrote:

@ivozandhuishttps://github.com/ivozandhuis Yes, your picture shows the situation how I initially understood it! My reasoning was that if the time/space slices are a representation of physical reality (plots of land + timespan), we can build a conceptual layer (based on naming conventions, law, etc) on top of that.

E.g., assuming we have the following space/time slices:

  1. polygon A from 1900 till 1910
  2. polygon B from 1905 till 1910
  3. multi-polygon A+B from 1910 till 1920
  4. polygon A from 1920 till 1925
  5. polygon B from 1920 till 1930

We can come up with the following conceptualizations:

There may be many more conceptualizations, depending on the use case. Three conceptualizations are already in wide use (gemeentegeschiedenis, Amco and CBS). I'm not sure whether these three conceptualization should be properties of the same resource or independent resources. IIUC they are not exactly the same, so we cannot model them as owl:sameAs of one another.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/CLARIAH/wp2-GISLOD/issues/6#issuecomment-279935810, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADUbfibVxE7THZNYoEh4nEGS2Ut79qVAks5rcqeqgaJpZM4L8sbW.

ivozandhuis commented 7 years ago

@rlzijdeman No: a T/S-slice is the smallest combination of time and space that was a gemeente. Every moment something changes (a border, a name, an identifying number) a new T/S-slice comes into existence. The good thing of a perceel or sectie (kadaster area) is that those are T/S-slices as well. By introducing the abstract T/S-slice, we can relate kadaster stuff with gemeente stuff? Or relate the various gemeente-codes. Until now our gemeentegeschiedenis-uri's were the smallest unit, which enabled us to relate the codes with each other and the names. To a uri various geographical units are related (hence mostly more than one map on the page).

Main question: are we really the first in the world to define such a concept? I would be surprised.

rlzijdeman commented 7 years ago

Don’t be surprised! T/S-slices make US great again. They’re the best thing around, really, I’m not kidding. Just ask anyone!

rlzijdeman commented 7 years ago

This Harvard paper[1] discusses a number of solutions to GIS modelling, although not the specific time-slice. I think the conclusion is still something that would be worthwhile to discuss March 1st

[1] http://www.fas.harvard.edu/~chgis/work/docs/papers/CGA_Wkshp2009_Lex_9apr09.pdf

hekl commented 7 years ago

img_0040

Ik voeg dit model toe. Alle entiteiten zijn tijdvariant, de Amsterdamse code het minst. Een unieke geometrie voor een gemeente in een bepaalde tijd is het uitgangspunt. Die krijgt een uri. Maar ook de naamsvarianten van gemeenten kunnen een aparte uri krijgen.

mmmenno commented 7 years ago

On the Erfgoed & Locatie project, we named our smallest units PiTs (Place in Time). The Time / Space approach seems logical when dealing with historical areas.

Please remember, however, that gemeentegeschiedenis geometries are only modelling the actual borders of municipalities. I'm 100% sure we've got different geometries for exactly the same borders, for instance - cases that falsely suggest a change in 'the real world'.

Urification of geometries sounds like a good idea, but only to distinguish geometries and their provenance (did we get the geometry from Boonstra, Kadaster, CBS or another source?).

Municipalities are administrative bodies defined in Koninklijke Besluiten, and i think these definitions should be the core concepts of gemeentegeschiedenis. As they are in the Repertorium, CBS, BAG, etc.

mmmenno commented 7 years ago

O, and the Amsterdamse Code - in my humble opinion - really really sucks. It should never be used to define a municipality. It's nothing more than an arbitrary construct to model continuation when in reality there is a clear break.

hekl commented 7 years ago

Ja, de Amsterdamse code is niet bedoeld ter identificatie. Je hebt hem voor je modellering niet nodig. Het bestaan van marginaal verschillende geometrieen voor dezelfde gemeente en hetzelfde tijdstip is lastig, maar moet het model niet compliceren. We kunnen het beste de bronnen aangeven, en waar mogelijk een continue bron gebruiken. Ik ben het eens met je opmerkingen daarover.