MoEMLondon / MoEMLissues

0 stars 0 forks source link

Temporal encoding for integration with LINCS #3

Closed martindholmes closed 1 year ago

martindholmes commented 2 years ago

This ticket proposes three specific changes:

  1. Schema changes:

The <state> element must be added to the content model of the <place> element, but must not be allowed anywhere else. This will mean changing the class membership of the <state> element and altering the content models of any macros which contain it to eliminate it from all other locations, then creating a custom content model for the <place> element which includes <state>. <state> will require all its dating attributes.

  1. Encoding practice changes:

The following encoding practices will be adopted and tested within MoEML: i. Multiple <location> elements will be allowed inside <place>, and each will have a single <geo> element for a specific location for the place at a specific time[-range]. @from-custom and @to-custom attributes will be supplied whenever a location has a limited temporal range within the MoEML project's bounds. Where such attributes are not provided, there will be a global assumption that MoEML claims this location is valid for the period 1536 to 1666, but makes no claims outside that range. Where one or other bounding attribute is missing, the standard MoEML range boundary will be assumed. ii. Multiple <state> elements will be allowed withing <place>, with the same assumptions as to date ranges as apply to <location> elements. Each <state> element should also carry an @ana attribute pointing to a MoEML category, using the same private URI scheme as MoEML uses for <catRef>/@target. A corresponding <catRef> should also be present for the purposes of normal MoEML processing (we can check this with Schematron). Each <state> element represents the function a place had during a specific time range. iii. Multiple <placeName> elements are allowed in <place>, with the same dating attributes having the same purpose as discussed above.

All dating attributes will be in our normal MoEML mol:julian format.

  1. GeoJSON output changes: i. Each location element will be processed into a correct GeoJSON geometry as currently, ignoring the associated dating content, but then in a subsequent member of the GeoJSON "properties" object, an array of location date-ranges will be constructed, with each one assumed to correspond with the equivalent geometry. ii. While the canonical place name for the place will be taken from the same place as currently in the TEI metadata, a new member of the GeoJSON "properties" object will hold an array of place names constructed from the <placeName> elements, each with a date-range as appropriate. iii. Ditto for place function, as represented by <state> elements.

In the GeoJSON, all date ranges will be converted to proleptic Gregorian, before their month and day information will be stripped, providing a less granular date range, but one which is less likely to cause confusion for those not used to proleptic Gregorian.

The precise names and datatypes for the GeoJSON custom properties should be agreed ahead of time with the LINCS team, so that we use a standard vocabulary and their ingestion code can be more standardized.

martindholmes commented 2 years ago

As a follow-up, there's a bit of a wrinkle in this bit:

Multiple <location> elements will be allowed inside <place>, and each will have a single element for a specific location for the place at a specific time[-range]. @from-custom and @to-custom attributes will be supplied whenever a location has a limited temporal range within the MoEML project's bounds. Where such attributes are not provided, there will be a global assumption that MoEML claims this location is valid for the period 1536 to 1666, but makes no claims outside that range. Where one or other bounding attribute is missing, the standard MoEML range boundary will be assumed.

The GeoJSON spec does not allow distinct geometries for a single feature EXCEPT through the use of a GeometryCollection (https://datatracker.ietf.org/doc/html/rfc7946#section-3.1.8). The possible wrinkle is that there may be places which require a GeometryCollection to describe a single instance of their location (imagine for example a location that consists of a building [Polygon], a road [LineString] and a small tower [Point]). This case would need to be distinguished from cases where a GeometryCollection is being used to represent multiple feature locations for different time periods. We would have to make the assumption that:

martindholmes commented 2 years ago

It seems that this approach is being nixed in favour of using TTL, which is fine, but we still need the schema changes and we need to see examples of how to model our data in TTL from the LINCS team before we can go ahead.

martindholmes commented 2 years ago

In the meantime, working on the schema: added the <state> element with its @ana values generated from our location categories in rev 20669.

martindholmes commented 2 years ago

Since the TTL is now working OK for LINCS (as far as I know), I'm not sure where we should go from now with this ticket. @JanelleJenstad should we just close it?

JanelleJenstad commented 2 years ago

We will revisit this issue in late October 2022. MH would like to rethink the plan from first principles.

martindholmes commented 2 years ago

I believe this is now valid and working, but I'm still waiting to hear from EC at LINCS to say that it works for them.

martindholmes commented 1 year ago

It's working for LINCS, so I'm closing this.