glossarist / iev-data

1 stars 1 forks source link

Parse *most* SOURCE entries #74

Closed ronaldtse closed 3 years ago

ronaldtse commented 3 years ago

This PR handles all SOURCE entries in the IEV dataset properly, except for these three types:

# We need to seek clarification on what these actually mean...
[extract_single_source] see IEC 60050-131, IEC 61293, IEC 60417 item 5031
{:source_ref=>"IEC 60050-131", :clause=>nil, :relation_type=>{:type=>:related}}
[extract_single_source] voir la CEI 60050-131, CEI 61293, CEI 60417, N°5031
{:source_ref=>"IEC 60050-131", :clause=>nil, :relation_type=>{:type=>:related}}

# These should be split into "ISO/IEC Guide 2, 14.4 MOD; 191-14-02 MOD"
[extract_single_source] ISO/IEC Guide 2 (14.4 MOD), 191-14-02 MOD
{:source_ref=>"IEV", :clause=>"14.4", :relation_type=>{:type=>:modified}}
[extract_single_source] ISO/CEI Guide 2 (14.4 MOD), 191-14-02 MOD
{:source_ref=>"IEV", :clause=>"14.4", :relation_type=>{:type=>:modified}}

# These should be split into "ISO 921:1997, definition 453, modified; ISO 921:1997 definition 483, modified"
[extract_single_source] ISO 921:1997, definition 453, modified and definition 483, modified
{:source_ref=>"ISO 921:1997",
[extract_single_source] ISO 921:1997, définition 453, modifiée et définition 483, modifiée
skalee commented 3 years ago

@ronaldtse Since this pull request authoritative_source became an array. Bug or intended? If the latter, it will require corresponding changes in sites too. The failing spec is because of this change.

For example 112-01-01 fra before:

  authoritative_source:
    ref: ISO/IEC Guide 99:2007
    clause: 1.1, modifié – Suppression de NOTE 3 et NOTE 4, et quelques modifications
      dans les autres.
    link: https://www.iso.org/standard/45324.html

and now (note a leading dash in 2nd LOC):

  authoritative_source:
  - ref: ISO/IEC Guide 99:2007
    clause: '1.1'
    link: https://www.iso.org/standard/45324.html
    relationship:
      :type: :modified
      :modification: Suppression de NOTE 3 et NOTE 4, et quelques modifications dans
        les autres.
    original: ISO/IEC Guide 99:2007, 1.1, modifié – Suppression de NOTE 3 et NOTE
      4, et quelques modifications dans les autres.
ronaldtse commented 3 years ago

This is intended because the SOURCE field apparently has multiple sources listed, and we need to separate them one by one. Can you have update the spec and the sites? Thanks!