HistoryAtState / tags

Source data for Subject Taxonomy of the History of U.S. Foreign Relations
https://history.state.gov/tags
2 stars 7 forks source link

Clean up submissions for FRUS 1934-39, etc.; place into frus-tags.xml #3

Closed joewiz closed 8 years ago

joewiz commented 8 years ago

frus-tags-todo.zip

cleanup-scripts.zip

joewiz commented 8 years ago

@awmarrs I found them!

awmarrs commented 8 years ago

Huzzah!

joewiz commented 8 years ago

Closed with https://github.com/HistoryAtState/tags/commit/7cad575052dfddbe8bfcc51888684b8b929a447b.

The procedure:

  1. Open .xlsx submission in Excel, save to .xls
  2. Import .xls in oXygen (File > Import > Excel)
  3. Rename root/row as studies/study, rename columns as title, link, tags, unofficial-tags-tbd
  4. Save to eXist, in /db/apps/tags/tagged-resources/frus-tags-todo
  5. Convert exact matches to final format:

    xquery version "3.1";
    
    let $studies := collection('/db/apps/tags/tagged-resources/frus-tags-todo')//study
    for $tag in $studies//tag[not(@id)]
    let $match := doc('/db/apps/tags/taxonomy/taxonomy.xml')//label[. = $tag]/../id
    return
    if ($match) then
    (
    update insert attribute id {$match} into $tag,
    update value $tag with ''
    )
    else ()
  6. Merge these into existing frus-tags document (and move non-matches into unofficial-tags-tbd):

    xquery version "3.1";
    
    let $frus-tags := doc('/db/apps/tags/tagged-resources/frus/frus-tags.xml')//study
    let $new-studies := 
       for $study in collection('/db/apps/tags/tagged-resources/frus-tags-todo')//study
       let $link := $study/link
       let $vol-id := substring-after($link, 'historicaldocuments/')
       return 
           element study {
               element title { normalize-space($study/title) },
               element link { "https://history.state.gov/historicaldocuments/" || $vol-id },
               element tags { 
                   for $tag in $study/tags/tag[@id]
                   order by $tag/@id
                   return 
                       element tag { attribute id { $tag/@id } }
               },
               element unofficial-tags-tbd {
                   for $tag in distinct-values(($study/tags/tag[not(@id)], $study/unofficial-tags-tbd/tag))
                   order by $tag
                   return
                       element tag { $tag }
               }
           }
    let $re-sorted-studies := 
       element studies {
           for $study in ($frus-tags, $new-studies)
           let $link := $study/link
           let $vol-id := substring-after($link, 'historicaldocuments/')
           order by $vol-id
           return $study
       }
    return 
       xmldb:store('/db/apps/tags/tagged-resources/frus', 'frus-tags.xml', $re-sorted-studies)

From the commit message:

This leaves only 8 published volumes without tags:

  • frus1889
  • frus1890
  • frus1891
  • frus1892
  • frus1893
  • frus1902app3 (intern noted this wasn’t available to him/her)
  • frus1969-76ve14p2
  • frus1977-80v30

(As discussed before, we intentionally did not submit frus1861-99Index & frus1900-18Index for tagging.)

As I did with policy studies tags, I moved all tags that did not have an exact match with entries in the taxonomy into the “unofficial-tags-tbd.” I didn’t proof these for typos / slight differences from the controlled values in the taxonomy. With further proofreading / fuzzy searching, some of these tags could be added and thus available to users of our website.

joewiz commented 8 years ago

Revised submissions as of https://github.com/HistoryAtState/tags/commit/4ff8e0a75f4b68b67082a947e170a2ff09cc6ad8: frus-tags-imported-vsfs-submissions-2015-16.zip

github-actions[bot] commented 11 months ago

:tada: This issue has been resolved in version 1.0.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: