bodleian / ora_data_model

Documentation and crosswalks relating to the ORA data model
1 stars 1 forks source link

Variable crosswalking of pagination #187

Closed mrdsaunders closed 4 years ago

mrdsaunders commented 4 years ago

The harvest crosswalk should crosswalk ORA data such as:

<mods:extent unit="pages">
    <mods:list>214-286</mods:list>
</mods:extent>

to

<api:field name="pagination">
  <api:pagination>
    <api:begin-page>214</api:begin-page>
    <api:end-page>286</api:end-page>
  </api:pagination>
</api:field>

However there are cases such as where:

<mods:extent unit="pages">
    <mods:list>1034–1045</mods:list>
</mods:extent>

where it is cross walking to:

<api:field name="pagination" type="pagination" display-name="Pagination">
    <api:pagination>
        <api:begin-page>1034–1045</api:begin-page>
    </api:pagination>
</api:field>

This is causing the REF submission system to produce validation errors for the pagination field for these items.

Can I ask whether this data is entered into ORA in a single field or is the concatenation of two fields?

The reason I ask is that on investigation it seems the working objects uses U+002D : HYPHEN-MINUS {hyphen or minus sign} as a hyphen, whereas the problematic one uses U+2013 : EN DASH.

If the pagination is entered into a single field then it may be that ORA reviewers are copying and pasting the hyphen in its variable form.

I'll test two objects with Toby, each using one of the characters.

tomwrobel commented 4 years ago

The ORA pagination field is a plain text field, so can contain any value put there by an editor. We do not store start and end pages separately. I think you are right that what we're seeing here is the result of copy/paste.

I don't know what the solution is, but I assume that the method which provides the hyphen behaviour can be modified to also include en-dash (where relevant).

In case you aren't familiar, some style guides require that an en dash be used between two numbers to indicate a range - and often mandate that no spaces are used, e.g. '1–5' not '1 - 5', '1 – 5', or '1-5'. Most people don't deliberately type one though - I had to look for keyboard instructions!

mrdsaunders commented 4 years ago

Thanks @tomwrobel . In testing @tobypitts and I have replicated the issue using those two characters. It has been resolved using a value map to convert the character:

<xwalk:value-map name="dash-hyphen" matchMode="anyPosition"> 
    <xwalk:value-mapping from="–" to="-" />
    <!--From U+2013 : EN DASH -> U+002D : HYPHEN-MINUS -->
</xwalk:value-map>
mrdsaunders commented 4 years ago

Crosswalk has been updated in QA and PROD