digirati-co-uk / pmc-lux

Transforming data from PMC catalogues for import to LUX
MIT License
1 stars 0 forks source link

Library – retain '(PAMPHLET)' in 'additional identifiers' #3

Closed brutaldigital closed 3 weeks ago

brutaldigital commented 1 month ago

The string (PAMPHLET) contained in the <class> element is being processed for Categorized As on the LO. It should also be appended to the Additional Identifiers as it is part of the class mark.

e.g. record ID D11371 class is "7 SUTH (PAMPHLET)" but is displaying in LUX as just Additional Identifiers: 7 SUTH https://lux-front-sbx.collections.yale.edu/view/text/3c1de68e-1637-4f4b-b7a8-3de6224a2695

azaroth42 commented 1 month ago

Is this the case for other parentheticals as well, or only PAMPHLET?

brutaldigital commented 1 month ago

We also have

Which have been helpfully classed as large-format (see https://lux-front-sbx.collections.yale.edu/view/text/c1e7db6a-7e18-4b6d-87b0-5a4206626e4e), but that designation should also part of the call number.

tomcrane commented 1 month ago

Record has

    <class>7 SUTH</class>
    <class>(PAMPHLET)</class>

atm these aren't processed together; we see that we have a variant of "PAMPHLET" (we're tolerant of parentheses) therefore we classify the work as a pamphlet; we also see that we have a string that doesn't match to any specific processing rules "7 SUTH" therefore we add that in as an identifier for the work.

Relevant code: https://github.com/tomcrane/linked-art-net/blob/pmc/LinkedArt/PmcTransformer/Library/Processor.cs#L264-L287

(related to #5 perhaps)

Should the rule here be that:

Current output:

{
  "@context": "https://linked.art/ns/v1/linked-art.json",
  "id": "https://data.paul-mellon-centre.ac.uk/library/work/D11372",
  "type": "LinguisticObject",
  "_label": "Graham Sutherland",
   ...
  "classified_as": [
    {
      "id": "http://vocab.getty.edu/aat/300026096",
      "type": "Type",
      "_label": "Exhibition catalogue"
    },
    {
      "id": "http://vocab.getty.edu/aat/300220572",
      "type": "Type",
      "_label": "Pamphlet"
    }
  ],
  "identified_by": [
    { ... },
    { ... },
    {
      "type": "Identifier",
      "content": "7 SUTH"
    },
    { ... }
  ],
 ... 

Possible output:

As above but the "naked" identifier becomes


    {
      "type": "Identifier",
      "content": "7 SUTH (PAMPHLET)",
      "classified_as": [
        {
          "id": "http://vocab.getty.edu/aat/300311706",
          "type": "Type",
          "_label": "Call Number"
        }
      ]
    },
tomcrane commented 3 weeks ago

https://github.com/tomcrane/linked-art-net/commit/b517df1e201a00ab47675797a9dfc076fd32b8b4