Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

MODS fields in Drupal #1083

Closed Natkeeran closed 3 years ago

Natkeeran commented 5 years ago

Currently Islandora MIG has draft recommendations for MODS to RDF Field Mappings (full list!). The assumption is that those mappings will be reflected in Drupal as fields and typed fields (one field with typed property mapping (i.e agent)).

islandora_demo currently ships with significant number of those fields. As MIG mapping is a work in progress, all widely used fields have not been added yet. When the recommendations become finalize, we need to add those fields.

Currently, the following among others are not in islandora_demo.

@dannylamb @rosiel @seth-shaw-unlv

seth-shaw-unlv commented 5 years ago

So, we need new vocabularies in Controlled Access Terms for Genre, Form, Temporal, and Classification. We can then add the relevant Entity reference fields to islandora_demo.

We already have a Geographic Locations vocabulary, so we need to adjust the islandora_demo subject reference to not include them and add a separate entity reference field for geographic.

It isn't clear to me what we are doing with placeTerm since both text and reference are listed. It would be best to have one. Either transform those texts into Place Taxonomy terms or turn the referenced ones into strings. (Probably the former option.) Also, do we add those Place terms to the Existing Geographic Locations, or is this a separate vocabulary?

The rest are all text, which are easy enough to add.

rosiel commented 5 years ago

The problem with placeTerm is that places (as publishing locations) are uncontrolled free-text. People write just anything in there. You might even have several cities listed in the same text field. It's meant to be a transcription of what's written on the bibliographic item (e.g. if the book was published in a place whose name changed, you want to record what's written on the item, not the name that that place is known by now).

That's different from geo-as-a-subject field, which is (supposed to be) from an authority.

The other thing with originInfo/place is that there's often a term (especially if you crosswalked from MARC) containing a "country code". It should be taken from the MARC Country Codes list which is coughangloamericancough countries for most of the world, but provinces and states of US and Canada. We often have this data, it seems a shame to lose it, it's from an authority, we'd like to map it to a controlled field.

whikloj commented 5 years ago

Have the Metadata Interest Group's findings been finalized, if so we should link to them from this ticket and start work on it.

dannylamb commented 5 years ago

Spreadsheet: https://docs.google.com/spreadsheets/d/18u2qFJ014IIxlVpM3JXfDEFccwBZcoFsjbBGpvL0jJI/edit#gid=0

More formal document: https://docs.google.com/document/d/15qSO9YcALtYSqd6CUuGx0t8FwUJ5pPwVPz0PA5rU898/edit?ts=5c5852f3#heading=h.wktgoorw3utm

dannylamb commented 5 years ago

And in reading the upscroll I see @Natkeeran has provided these links already :man_facepalming:

Natkeeran commented 5 years ago

Some clarifications are needed:

@rosiel @manez

Natkeeran commented 5 years ago

Posting the IRC discussions for reference:

 13:14 <Natkeeran> rlefaive: are we good to go with respect to MODS to RDF mapping (https://docs.google.com/spreadsheets/d/18u2qFJ014IIxlVpM3JXfDEFccwBZcoFsjbBGpvL0jJI/edit#gid=0) ?
13:18 <rlefaive> Natkeeran: that is a good question… almost? 😭
13:20 <Natkeeran> rlefaive: almost is good :), I am adding fields to the islandora_defaults and if needed vocabularies to controlled_access_terms
13:20 <Natkeeran> rlefaive: but need clarifications with respect to some fields
13:20 <rlefaive> ok, any questions in particular?
13:21 <Natkeeran> rlefaive: mods/originInfo/place/placeTerm [wtih type="code" and authority="marccountry"] - Would this be a separate vocabulary (i.e countries) than Geographic Location vocabulary?
13:23 <rlefaive> Natkeeran: Yes. This is the “marc country” vocabulary, which is standardized. It includes countries and provinces/states, so it’s not exactly “countries”. It’s used in all MARC records, and therefore (probably) exists in any MODS record that came from MARC.
13:24 <rlefaive> and it’s weird codes, like ‘pic’ for PEI.
13:25 <Natkeeran> rlefaive: noted
13:25 <rlefaive> “Geographic Location” is (if i’m thinking of the one you’re thinking of) a field for “place as topic”. This would be human-readable and could include “Charlottetown, PE” and the terms MAY have come from a standard vocabulary, though we can’t say which - could be geonames, could potentially be LCSH.
13:25 <rlefaive> could be a custom vocabulary, could be none at all.
13:25 <Natkeeran> rlefaive: Geographic Location need to be restricted to the use of topic!
13:26 <Natkeeran> speaking of geographic topic, currently there are 4 fields related to that
13:26 <Natkeeran> rlefaive: Geographic Location has coordinate info
13:27 <Natkeeran> rlefaive: and we can add other info such as geographiccode as well
13:27 <Natkeeran> rlefaive: do we need all for fields for subject_geographic?
13:28 <Natkeeran> rlefaive: mods/subject/geographic, mods/subject/cartographic/coordinates, mods/subject/geographicCode, mods/subject/hierarchicalGeographic
13:28 <rlefaive> Natkeeran: However (!) we planned to use the same … kind of field… for geographic topics as for Place of Publication. However in practice, Place of Publication is often even more freeform than Geographic Location. This should be transcribed from the item (which means using the historical name instaead of a current, authorized name, if that’s what’s written on the Item In Hand). It’s even acceptable cataloguing practice (in MARC) to put multipl
13:28 <rlefaive> place names in a single field. So… reconciling that is going to be extra hard.
13:29 <rlefaive> Natkeeran: All those fields are currently used in MODS records, and we didn’t want to drop any of them as part of the official mapping.
13:30 <Natkeeran> rlefaive: mods/originInfo/place/placeTerm [with type="text" or no type] is noted to be literal, thus as you noted, probably easier to leave it as text, if people want to change that to use the vocabulary, they have that option as well
13:30 <Natkeeran> rlefaive: I was mainly wondering if we can include the countries vocabulary within the Geographic Location vocabulary or need a separate vocabulary
13:31 <rlefaive> Natkeeran: yeah. It’s a shame that we can’t use a single field and have it be free-text OR vocabulary-driven, as needed. Because that’s the model that we’ve been using in MODS/MARC.
13:31 <Natkeeran> rlefaive: good point
13:31 <rlefaive> Natkeeran: What else is the Geographic Location vocabulary going to be populated with?
13:32 <Natkeeran> rlefaive: theoretically it can be populated with anything!
13:32 <rlefaive> Natkeeran: Since if we load an entire, e.g. geonames vocabulary, that’s hundreds of thousands of items.
13:32 <Natkeeran> rlefaive: specially for subjects, including possibly historical places, politically contentious places
13:34 <rlefaive> Natkeeran: It might be a better user model to have a lookup to an API (like geonames) that will autocomplete if it finds what the user’s looking for. However, Geonames isn’t particularly easy to use, and many times has multiple different “places” with the same name (denoting different “feature classes” like “populated place” or “seat of a first-order administrative division” (meaning city). :
13:34 <rlefaive> https://www.geonames.org/search.html?q=charlottetown&country=
13:37 <Natkeeran> rlefaive: It seems "seat of a first-order administrative division population 42,402" is the political division
13:38 <rlefaive> Natkeeran: yep, and (oops, mea culpa) the “area” is co-located (maybe?)
13:41 <Natkeeran> rlefaive: class -> code
13:41 <Natkeeran> rlefaive: depending on code, the code is different
13:41 <Natkeeran> rlefaive: https://www.geonames.org/5920286/charlottetown.html
13:42 <Natkeeran> rlefaive: another option might be https://www.geonames.org/5920286/charlottetown.html
13:42 <Natkeeran> rlefaive: another option might be https://www.openstreetmap.org
13:43 <Natkeeran> rlefaive: https://nominatim.openstreetmap.org/details.php?place_id=1378725
13:43 <Natkeeran> rlefaive: https://nominatim.openstreetmap.org/search.php?q=Charlottetown+&polygon_geojson=1&viewbox=
13:45 <rlefaive> Natkeeran: The point that I’m trying to make is that it’s hard to differentiate, and would be terrible to have to choose one of these from a drop-down, unless you were filtering on the correct feature class AND displayed enough context (e.g. in this case country, province - because apparently there’s a Charlottetown in Newfoundland?!). But I don’t think we can hard-code a particular feature class, because users might potentially need any.
13:47 <Natkeeran> rlefaive: you can show the type or feature class as in the openstreetmap search, however, yes it can be complicated
13:48 <rlefaive> Natkeeran: Oooh!! That Nominatim site is great! But again, even as a “metadata specialist”, I don’t know if the first or second result (the “city” as a point or the “county” as a region) is what I should use as a subject, or a place of publication.
13:49 <rlefaive> This is not the kind of authority that I, as a cataloguer, am used to.
13:52 <Natkeeran> rlefaive: it is a question of tradeoff, making it textual gives more freedom, making it a dropdown makes it linked data/machine processable (i.e visualizations)
13:53 <rlefaive> Natkeeran: When you say making it a dropdown, do you mean a closed vocabulary (can’t add new things through the Add Content form) or an open one (can enter whatever you want, even if it doesn’t match something already there)?
13:54 <Natkeeran> rlefaive: I don't see why we would need to restrict adding mew things/places
13:54 <Natkeeran> rlefaive: yes, making it a vocabulary
13:54 <rlefaive> Natkeeran: ok phew
13:56 <Natkeeran> rlefaive: do we need the four fields for subject_geographic?
13:57 <Natkeeran> rlefaive: I am not sure if we can represent hierarchy with one field for mods/subject/hierarchicalGeographic?
14:00 <rlefaive> Natkeeran: The mapping, at https://docs.google.com/spreadsheets/d/18u2qFJ014IIxlVpM3JXfDEFccwBZcoFsjbBGpvL0jJI/edit#gid=0 , uses “Geographic Subjects” as the field name in drupal (Column C) for /geographic, /geographicCode, and /hierarchicalGeographic. We had intended, by doing that, to say they should be lumped in the same field. If I recall correctly, and this is somethign that I wanted to check with the MIG about, our solution for Hierarchical
14:00 <rlefaive> Geographic was to put every entry, all the way up the hierarchy as multiple values in that Geo Subject field, e.g. Geographic Subject: Brighton. Geographic Subject: Charlottetown. Geographic Subject: Queen’s County. Geographic Subject: Prince Edward Island. (and so on)
14:01 <Natkeeran> rlefaive: oh, I see
14:01 <Natkeeran> rlefaive: I was not reading the spreadsheet properly, that helps
14:02 <rlefaive> Natkeeran: It’s not exactly transparent. Thanks for letting me know, I’ll make sure the documentation makes clear what was meant.
14:04 <Natkeeran> rlefaive: few more questions, hope I am not taking too much of your time
14:04 <rlefaive> Natkeeran: go for it!
14:04 <Natkeeran> rlefaive: Rights [multi-valued field] - I suppose, here we mean long text?
14:05 <rlefaive> Yes.
14:06 <Natkeeran> rlefaive: same thing for note?
14:06 <rlefaive> Natkeeran: Yup.
14:08 <Natkeeran> rlefaive: Some of these fields are not available in any of the Islandora 7.x default forms. I assume they are being used by various repos, and this is a common guide. However, do we need to include all these fields in the default profile for 8?
14:11 <rlefaive> Natkeeran: that is a good question. Some fields, like bibo:volume, bibo:issue, bibo:pageStart, etc seem very cluttery because they only serve a specific use case (citation)
14:12 <rlefaive> Natkeeran: I’m tempted to say yeah, put them all in, to accommodate the mapping. The UI clutter could potentially get resolved by javascript things that show/hide fields conditionally.
14:12 <Natkeeran> rlefaive: ok, noted
14:13 <Natkeeran> rlefaive: for relatedItem, can we not create a another thing and relate to it?
14:13 <rlefaive> Natkeeran: actually yeah I think that’s my position, though if you’re like AAARGH it’s very understandable, we can make a “basic” … metadata profile.
14:13 <rlefaive> Natkeeran: that’s the plan.
14:14 <rlefaive> (re related item)
14:14 <Natkeeran> for classification - I assume we would need two vocabularies (llc, ddc)?
14:14 <rlefaive> (i.e. make a drupal Content thing for the related item (e.g. “Daily Planet” the publication) that is maybe also in Fedora and the triplestore and does not have any media directly attached
14:15 <rlefaive> Natkeeran: no. There are too many possible values, a free text field would suffice.
14:16 <Natkeeran> rlefaive: noted
14:17 <Natkeeran> rlefaive: coming back to mods/subject/cartographic/coordinates, do we need a separate field directly under Repository Item?, would not coordinates be always be associated with Geographic Subjects?
14:18 <rlefaive> Natkeeran: In a world where stuff modellied perfectly, yeah. but sometimes you have only coordinates and no “named place” to go with it (like the location where a specific sample was collected)
14:19 <Natkeeran> rlefaive: oh I see
14:20 <Natkeeran> rlefaive: there are two other fields that will need more custom work, dcterms:tableOfContents, this needs to be URI or literal; and dcterms:title with @[languagecode] (this can possibly handled in the jsonld hook)
14:21 <Natkeeran> rlefaive: maybe for tableOfContents, we can make it long text, then allow the user to enter url, not sure, have to think about that one bit more
14:21 <rlefaive> Natkeeran: speaking of languages … are we still using Drupal languages to assign different language codes to bits of metadata?
14:22 <rlefaive> Natkeeran: that sounds fine for tableOfContents.
14:23 <rlefaive> Natkeeran: maayyyybe some heuristic magic in the jsonld hook could see if “this text looks like a valid URL” and treat it as a URL in the JSONLD if so? (enhancement)
14:23 <Natkeeran> rlefaive: yes, we are, by default it will be English or any other default language you set, and if you add another language to translate to, and translate the node, yes it will add the translated language code
14:24 <Natkeeran> rlefaive: however, in this case, within the same language, we want to provide value in another language, that is where need to setup some jsonld hook magic
14:24 <Natkeeran> rlefaive: yes, good point for tableofcontents as well
14:25 <Natkeeran> rlefaive: this is good feedback, will be able to continue, thank you
14:25 <rlefaive> Natkeeran: those were good questions, keep them coming, sorry for delinquency etc etc. 
kstapelfeldt commented 3 years ago

Ideally, this would have been tested, but again, because work has continued, we think it's better to close this issue and start again with remaining metadata field issues.

As this has been merged and superseded by other work, we are closing it.