dini-ag-kim / avram

Specification of a schema language for MARC and related formats such as PICA and MAB
https://format.gbv.de/schema/avram/specification
5 stars 0 forks source link

Avram: should keyword $ref be allowed for extensive lists? #4

Closed cKlee closed 2 years ago

cKlee commented 6 years ago

The MARC relators code list is extensive and codes occurs in various fields. Instead of repeating the list over and over, it better be defined once:

fields:
  028A:
    subfields:
      "4":
        label: Code
        codes:
          "$ref": "#/definitions/relators"
  028C:
    subfields:
      "4":
        label: Code
        codes:
          "$ref": "#/definitions/relators"
  029A:
    subfields:
      "4":
        label: Code
        codes:
          "$ref": "#/definitions/relators"
definitions:
  relators:
    abr:
      label: Abridger
    acp:
      label: Art copyist
    act:
      label: Actor
    adi:
      label: Art director
    adp:
      label: Adapter
    aft:
      label: Author of afterword, colophon, etc.
    anl:
      label: Analyst
    anm:
      label: Animator
    ann:
      label: Annotator
    ant:
      label: Bibliographic antecedent
    ape:
      label: Appellee
    apl:
      label: Appellant
    app:
      label: Applicant
    aqt:
      label: Author in quotations or text abstracts
    arc:
      label: Architect
    ard:
      label: Artistic director
    arr:
      label: Arranger
    art:
      label: Artist
    asg:
      label: Assignee
    asn:
      label: Associated name
    ato:
      label: Autographer
    att:
      label: Attributed name
    auc:
      label: Auctioneer
    aud:
      label: Author of dialog
    aui:
      label: Author of introduction, etc.
    aus:
      label: Screenwriter
    aut:
      label: Author
    bdd:
      label: Binding designer
    bjd:
      label: Bookjacket designer
    bkd:
      label: Book designer
    bkp:
      label: Book producer
    blw:
      label: Blurb writer
    bnd:
      label: Binder
    bpd:
      label: Bookplate designer
    brd:
      label: Broadcaster
    brl:
      label: Braille embosser
    bsl:
      label: Bookseller
    cas:
      label: Caster
    ccp:
      label: Conceptor
    chr:
      label: Choreographer
    -clb:
      label: Collaborator
    cli:
      label: Client
    cll:
      label: Calligrapher
    clr:
      label: Colorist
    clt:
      label: Collotyper
    cmm:
      label: Commentator
    cmp:
      label: Composer
    cmt:
      label: Compositor
    cnd:
      label: Conductor
    cng:
      label: Cinematographer
    cns:
      label: Censor
    coe:
      label: Contestant-appellee
    col:
      label: Collector
    com:
      label: Compiler
    con:
      label: Conservator
    cor:
      label: Collection registrar
    cos:
      label: Contestant
    cot:
      label: Contestant-appellant
    cou:
      label: Court governed
    cov:
      label: Cover designer
    cpc:
      label: Copyright claimant
    cpe:
      label: Complainant-appellee
    cph:
      label: Copyright holder
    cpl:
      label: Complainant
    cpt:
      label: Complainant-appellant
    cre:
      label: Creator
    crp:
      label: Correspondent
    crr:
      label: Corrector
    crt:
      label: Court reporter
    csl:
      label: Consultant
    csp:
      label: Consultant to a project
    cst:
      label: Costume designer
    ctb:
      label: Contributor
    cte:
      label: Contestee-appellee
    ctg:
      label: Cartographer
    ctr:
      label: Contractor
    cts:
      label: Contestee
    ctt:
      label: Contestee-appellant
    cur:
      label: Curator
    cwt:
      label: Commentator for written text
    dbp:
      label: Distribution place
    dfd:
      label: Defendant
    dfe:
      label: Defendant-appellee
    dft:
      label: Defendant-appellant
    dgg:
      label: Degree granting institution
    dgs:
      label: Degree supervisor
    dis:
      label: Dissertant
    dln:
      label: Delineator
    dnc:
      label: Dancer
    dnr:
      label: Donor
    dpc:
      label: Depicted
    dpt:
      label: Depositor
    drm:
      label: Draftsman
    drt:
      label: Director
    dsr:
      label: Designer
    dst:
      label: Distributor
    dtc:
      label: Data contributor
    dte:
      label: Dedicatee
    dtm:
      label: Data manager
    dto:
      label: Dedicator
    dub:
      label: Dubious author
    edc:
      label: Editor of compilation
    edm:
      label: Editor of moving image work
    edt:
      label: Editor
    egr:
      label: Engraver
    elg:
      label: Electrician
    elt:
      label: Electrotyper
    eng:
      label: Engineer
    enj:
      label: Enacting jurisdiction
    etr:
      label: Etcher
    evp:
      label: Event place
    exp:
      label: Expert
    fac:
      label: Facsimilist
    fds:
      label: Film distributor
    fld:
      label: Field director
    flm:
      label: Film editor
    fmd:
      label: Film director
    fmk:
      label: Filmmaker
    fmo:
      label: Former owner
    fmp:
      label: Film producer
    fnd:
      label: Funder
    fpy:
      label: First party
    frg:
      label: Forger
    gis:
      label: Geographic information specialist
    -grt:
      label: Graphic technician
    his:
      label: Host institution
    hnr:
      label: Honoree
    hst:
      label: Host
    ill:
      label: Illustrator
    ilu:
      label: Illuminator
    ins:
      label: Inscriber
    inv:
      label: Inventor
    isb:
      label: Issuing body
    itr:
      label: Instrumentalist
    ive:
      label: Interviewee
    ivr:
      label: Interviewer
    jud:
      label: Judge
    jug:
      label: Jurisdiction governed
    lbr:
      label: Laboratory
    lbt:
      label: Librettist
    ldr:
      label: Laboratory director
    led:
      label: Lead
    lee:
      label: Libelee-appellee
    lel:
      label: Libelee
    len:
      label: Lender
    let:
      label: Libelee-appellant
    lgd:
      label: Lighting designer
    lie:
      label: Libelant-appellee
    lil:
      label: Libelant
    lit:
      label: Libelant-appellant
    lsa:
      label: Landscape architect
    lse:
      label: Licensee
    lso:
      label: Licensor
    ltg:
      label: Lithographer
    lyr:
      label: Lyricist
    mcp:
      label: Music copyist
    mdc:
      label: Metadata contact
    med:
      label: Medium
    mfp:
      label: Manufacture place
    mfr:
      label: Manufacturer
    mod:
      label: Moderator
    mon:
      label: Monitor
    mrb:
      label: Marbler
    mrk:
      label: Markup editor
    msd:
      label: Musical director
    mte:
      label: Metal-engraver
    mtk:
      label: Minute taker
    mus:
      label: Musician
    nrt:
      label: Narrator
    opn:
      label: Opponent
    org:
      label: Originator
    orm:
      label: Organizer
    osp:
      label: Onscreen presenter
    oth:
      label: Other
    own:
      label: Owner
    pan:
      label: Panelist
    pat:
      label: Patron
    pbd:
      label: Publishing director
    pbl:
      label: Publisher
    pdr:
      label: Project director
    pfr:
      label: Proofreader
    pht:
      label: Photographer
    plt:
      label: Platemaker
    pma:
      label: Permitting agency
    pmn:
      label: Production manager
    pop:
      label: Printer of plates
    ppm:
      label: Papermaker
    ppt:
      label: Puppeteer
    pra:
      label: Praeses
    prc:
      label: Process contact
    prd:
      label: Production personnel
    pre:
      label: Presenter
    prf:
      label: Performer
    prg:
      label: Programmer
    prm:
      label: Printmaker
    prn:
      label: Production company
    pro:
      label: Producer
    prp:
      label: Production place
    prs:
      label: Production designer
    prt:
      label: Printer
    prv:
      label: Provider
    pta:
      label: Patent applicant
    pte:
      label: Plaintiff-appellee
    ptf:
      label: Plaintiff
    pth:
      label: Patent holder
    ptt:
      label: Plaintiff-appellant
    pup:
      label: Publication place
    rbr:
      label: Rubricator
    rcd:
      label: Recordist
    rce:
      label: Recording engineer
    rcp:
      label: Addressee
    rdd:
      label: Radio director
    red:
      label: Redaktor
    ren:
      label: Renderer
    res:
      label: Researcher
    rev:
      label: Reviewer
    rpc:
      label: Radio producer
    rps:
      label: Repository
    rpt:
      label: Reporter
    rpy:
      label: Responsible party
    rse:
      label: Respondent-appellee
    rsg:
      label: Restager
    rsp:
      label: Respondent
    rsr:
      label: Restorationist
    rst:
      label: Respondent-appellant
    rth:
      label: Research team head
    rtm:
      label: Research team member
    sad:
      label: Scientific advisor
    sce:
      label: Scenarist
    scl:
      label: Sculptor
    scr:
      label: Scribe
    sds:
      label: Sound designer
    sec:
      label: Secretary
    sgd:
      label: Stage director
    sgn:
      label: Signer
    sht:
      label: Supporting host
    sll:
      label: Seller
    sng:
      label: Singer
    spk:
      label: Speaker
    spn:
      label: Sponsor
    spy:
      label: Second party
    srv:
      label: Surveyor
    std:
      label: Set designer
    stg:
      label: Setting
    stl:
      label: Storyteller
    stm:
      label: Stage manager
    stn:
      label: Standards body
    str:
      label: Stereotyper
    tcd:
      label: Technical director
    tch:
      label: Teacher
    ths:
      label: Thesis advisor
    tld:
      label: Television director
    tlp:
      label: Television producer
    trc:
      label: Transcriber
    trl:
      label: Translator
    tyd:
      label: Type designer
    tyg:
      label: Typographer
    uvp:
      label: University place
    vac:
      label: Voice actor
    vdg:
      label: Videographer
    -voc:
      label: Vocalist
    wac:
      label: Writer of added commentary
    wal:
      label: Writer of added lyrics
    wam:
      label: Writer of accompanying material
    wat:
      label: Writer of added text
    wdc:
      label: Woodcutter
    wde:
      label: Wood engraver
    win:
      label: Writer of introduction
    wit:
      label: Witness
    wpr:
      label: Writer of preface
    wst:
      label: Writer of supplementary textual content
pkiraly commented 6 years ago

Good idea! It is not only relators but some other lists are recurring in different places.

Do you know OpenAPI or smartAPI? They suggest solution for the description of API calls in a similary fashion. See http://idratherbewriting.com/learnapidoc/pubapis_openapi_step5_components_object.html#reused_parameters

cKlee commented 6 years ago

Do you see any significant difference between https://json-spec.readthedocs.io/reference.html ,http://idratherbewriting.com/learnapidoc/pubapis_openapi_step5_components_object.html#reused_parameters and https://swagger.io/docs/specification/using-ref/ ? Looks like all same for me.

pkiraly commented 6 years ago

Sorry, I was simply not aware that it is existing within the JSON schema as well. My fault.

cKlee commented 6 years ago

So the question is, if this must be defined within the Avram specification. I guess as long as Avram does not disallow other properties like definitions it is applicable, right?

pkiraly commented 6 years ago

As I said, it is a good idea, I support it.

nichtich commented 6 years ago

Good point! Avram disallows custom properties like definitions but shared code lists should be supported. Such codelists, are already shared in other formats, e.g. in RDF at http://id.loc.gov/vocabulary/relators so it would be better to reference them instead of inventing another way to express them. How about a codelist field alternative to codes?

fields:
  028A:
    subfields:
      "4":
        label: Code
        codelist: http://id.loc.gov/vocabulary/relators
  028C:
    subfields:
      "4":
        label: Code
        codelist: http://id.loc.gov/vocabulary/relators
  029A:
    subfields:
      "4":
        label: Code
        codelist: http://id.loc.gov/vocabulary/relators

Applications can internally expand codelist to code if needed. A codelist in RDF should be a skos:ConceptScheme with codes in skos:notation, that's how codelists are provided by LoC.

cKlee commented 6 years ago

Like @pkiraly said, there are other shared resources and not only the MARC relators. And for my taste it is too expensive to make a HTTP call and RDF parsing.

As I can tell, neither https://format.gbv.de/schema/avram/schema.json nor the specification does disallow other properties than the mentioned ones. If you keep it this way, there is nothing that can stop anyone for using keywords like $ref, definitions etc.

If JSON-Reference is part of the JSON spec, then Avram has not to care about it. It is then implicit, that the JSON-Reference gets expanded by the application.

nichtich commented 6 years ago

JSON reference is not part of the JSON specification but part of JSON Schema. Avram is no subset of JSON Schema but described by a JSON Schema. Any use of additional keywords require custom implementations so the Avram schema would only be usable with your own tools instead of general Avram validators.

By the way JSON Schema also allows HTTP URIs as value of $ref but it does not require to make a HTTP call:

The URI is not a network locator, only an identifier. A schema need not be downloadable from the address if it is a network-addressable URL, and implementations SHOULD NOT assume they should perform a network operation when they encounter a network-addressable URI.

This could also be stated for use of external code lists in Avram. Still there are arguments for native support of JSON Reference. Anyway, I won't add more features unless there is a complete implementation of Avram validator in at least one (preferably multiple) programming languages.

cKlee commented 6 years ago

I see. https://www.rfc-editor.org/info/rfc6901 (JSON Pointer as a basis for JSON Reference) is a proposed standard, not a standard.

But I don't get it: What is the purpose of a list identifier (in Avram), when it is not downloaded and parsed to enable validation?

And what should I do now with my MARC relators list? Repeat it for every subfield :worried: ?

pkiraly commented 6 years ago

Dear @cKlee, as I see - given the limited resources - the viable approach is a gradual improvement. It is clear, that we can not solve all issues, and implement all ideas in one step.For me it took 4 month of hard work to setup the MARC model in Java, and still there are lots of things which are not solved. But: still the current version is usable, and according to the feedback it is useful for detecting some set of issues in the MARC records. So I would prefer a "dumb" standard which has an implementation over a smart one without implementation. When the current set is implemented, we can move on. Meantime you can work on a documentation of the "reference feature".

cKlee commented 6 years ago

Note to myself: documenting references see https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.1.md#relative-references-in-urls and https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.1.md#referenceObject

nichtich commented 5 years ago

JSON Pointer is fully specified as RFC 6901 so it could be used similar to the way it is specified in JSON Schema. JSON Schema also extends JSON Pointer to relative JSON Pointers but I don't see the benefit of this additional overhead. The reusable parts could be put in a generic definitions key or in more specific keys such as codelists:

definitions:
  relators:
    # put actual codelist here
fields:
  028A:
    subfields:
      "4":
        label: Code
        codes: '#/definitions/relators'
  028C:
    subfields:
      "4":
        label: Code
        codes: '#/definitions/relators'
  029A:
    subfields:
      "4":
        label: Code
        codes: '#/definitions/relators'
cKlee commented 5 years ago

Okay, but then there is no indication that the value of codes is a json pointer, right? The parser logic must somehow be like:

nichtich commented 5 years ago

Have a look at the corresponding section in JSON Schema: an URI is just an identifier of a document, so the validator does not need to retrieve it if the URI is already known. But in practice you are right. For simplicity I'd disallow relative URIs.

P.S: It turns out more complex if there are alternatives, should we also support this? https://github.com/gbv/avram/issues/6

cKlee commented 5 years ago

From the JSON Schema spec https://json-schema.org/latest/json-schema-core.html#rfc.section.8.3 :

"The URI is not a network locator, only an identifier. A schema need not be downloadable from the address if it is a network-addressable URL, and implementations SHOULD NOT assume they should perform a network operation when they encounter a network-addressable URI."

And from Section 8.3.1:

"The use of URIs to identify remote schemas does not necessarily mean anything is downloaded, but instead JSON Schema implementations SHOULD understand ahead of time which schemas they will be using, and the URIs that identify them."

Okay! If I got this right. In our case URIs like http://id.loc.gov/vocabulary/relators SHOULD be known to Avram schema (implementers). Thus the document must not be downloaded.

But which URIs should be known by implementers? Is the list arbitrary? Depending on use case and implementers knowledge?

nichtich commented 4 years ago

Rereading the discussion I think it may be easier to add allow code also to reference an URI alternatively to an explicit codelist. Whether and how to turn the URI into a codelist would not be part of the specification. Validators may

What do you think?