DigitalCommons / mykomap

A web application for mapping initiatives in the Solidarity Economy
3 stars 0 forks source link

Support localisation of ad-hoc field names #199

Open ColmDC opened 1 year ago

ColmDC commented 1 year ago

Is your feature request related to a problem? Please describe. rdf natively supports localisation of vocab names, but there is a case for also generically supporting this for ad hoc fields

For example, it may be useful to display the co-ops uk title of the field "Sector - Simplified, High Level" in the dialog, in Welsh. (Currently this is 'Sector (Coops UK)')

Describe the solution you'd like When configuring the data factory for a new data source, we need to be able to support the provisions of localisations for an ad-hoc field's label. These are published with the data.

Describe alternatives you've considered Historically we chose to support minimal functionality for adhoc fields and require choosing an rdf format to map them to in order to have the advanced features. We have moved away from this, so now need to at least support in field title localisation, without the full rdf effort.

wu-lee commented 1 year ago

I feel like I should probably define our terminology. This is what I have been using ad-hoc for:

Ad-hoc field: a field which contains values from a finite list of literal values, which are displayed as-is. This list is deduced by Mykomap from the values seen in the data. As such, it cannot be localised. An example of this would be the "Sector - Simplified, High Level" field of Co-ops UK's data, which we do not have any taxonomy information for.

In contrast to:

Vocab field: a field which contains values from a finite list of identifiers, which are displayed by looking up these identifiers in a table to get the localised phrases for the target language. The list is defined in advance in Mykomap's configuration. Identifiers not in this list are invalid and cannot be used. An example of this kind of field would be "Primary Activity" and "Secondary Activities" - those using the ESSGLOBAL "Activities" vocab, which are represented using the URIs for the vocab terms. For SKOS vocabularies like this, the URIs can be represented in full form (https://lod.coop/essgloba/2.1/standard/activities/A20), abbreviated using some pre-defined prefix (ac:A20), or truncated down to just the ID (A20), depending on the context.

The point I want to make here is I think to make "Sector - Simplified, High Level" field localised, we would have to convert it into a vocab field (see below). Presumably this is what you mean?

The simplest way which would work is to keeping the data as-is (a bunch of labels in English), then hand-writing a vocab file, analogous to this one, but mapping these labels, as if they were identifiers, to localised phrases.

This would work, although would not really be compatible with linked data vocabs because no one sane would use English phrases as the slugs in term URIs, like this https://lod.coop/cuk/sector/Membership%20associations,%20social%20clubs%20and%20trade%20unions. Maybe that's not a problem?

If we did want to be LOD-friendly, we'd also need to define the URIs, then have the sausage machine rewrite the English labels into them. This is what is done for ICA data which are vocab fields, like Country and Region. For example, "United States" gets rewritten as https://lod/coop/essglobal/2.1/standard/countries-iso/US


This seems a good moment to continue and define some other terms.

Another axis of field category is whether these are core, or lime-query (AKA standard), or custom.

Core field: these fields are those minimally necessary to represent an item (AKA initiative) in Mykomap. These are: "URI"`, "Name", "Latitude" and "Longitude".

Lime-Query or standard fields: these were historically the built-in fields used by Mykomap, but have become custom more recent versions. Examples are "Primary Activity" and "Secondary Activities". They are currently the only ones preserved in the linked data generated by the sausage factory.

Custom fields: these are fields which Mykomap might be can be configured to include in addition to the core fields. An example is the "Sector - Simplified, High Level" field from the Co-ops UK data. This field only appears in the standard.csv file deployed on the web with the static RDF data. ("Standard" is historical, and a misnomer - this does not only contain the standard fields, but all the configured fields, core and custom.)

I think custom in this sense is what you mean by local fields, when they're also not part of the standard set?

And then another axis is required or optional:

Required fields: these are fields which must have a value. Of the core fields, "URI" and "Name" are required. "Primary Activity" is another example which is typically required.

Optional fields: these are fields which may have no value, whether or not the field is singular or multi-valued. Of the core fields, latitude and longitude are optional. Secondary activities is another example which is optional.

And then singular vs multi-value fields:

Singular fields: these fields are those which can have at most one value. If they are required, like "URI", they must have one value. If optional, like "Latitude", they may have no value.

Multi-value fields: these fields are those which can have zero or more values. Currently this implies they are optional as the minimum is always zero. A typical example is "Secondary Activities".

In this axis of categorisation, approximately, fields also can be of type custom in the sense of being inserted by a custom function - but this is unusual. To avoid confusion I should probably call these something else, perhaps programmatic

Custom AKA programmatic fields - fields which are generated by some rule from other sources. An example is the "Short postcode" used in some Owned-by-Oxford demos, which was generated by a function which chopped off the second part of the UK postcode field, so that if the "Postcode" field was "OX1 2AB", then "Short Postcode" would be "OX1". Not to be confused with custom in the sense of configured, above. Also note that in practice this can result in both singular or multi-value results.

That's all I can think of for now! (This can be moved to documentation later)