WGBH-MLA / AAPB2

American Archive of Public Broadcasting
https://americanarchive.org/
Other
25 stars 9 forks source link

Maine Public Digitized Items, Not Online #2797

Closed ekemeyer closed 1 week ago

ekemeyer commented 1 week ago

Details

Could I get an excel spreadsheet listing the GUIDs and titles of items from Maine Public Broadcasting Network that have been digitized but are not in the ORR? Or, if easier, it could be a spreadsheet of everything digitized that identifies what is/is not in the ORR. It looks like there are 254 items digitized, 47 of which are available online. I want to send the Excel sheet to the station to get approval to put more items in the ORR. Let me know if you have any questions! Thanks!

Submitted by: Michelle CC in communications: Priority: Medium (within this month) URL: Slack message thread:

foo4thought commented 1 week ago
myself@penguin:~/Downloads$ curl 'https://americanarchive.org/api.json?q=%28contributing_organizations:%22Main
e%20Public%20Broadcasting%20Network%20%28ME%29%22%20AND%20access_types:digitized%29%20NOT%20%28access_types:on
line%29' | jq -r '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   340    0   340    0     0   2278      0 --:--:-- --:--:-- --:--:--  2281
{
  "responseHeader": {
    "status": 0,
    "QTime": 5,
    "params": {
      "q": "(contributing_organizations:\"Maine Public Broadcasting Network (ME)\" AND access_types:digitized) NOT (access_types:online)",
      "rows": "0",
      "wt": "ruby"
    }
  },
  "response": {
    "numFound": 207,
    "start": 0,
    "docs": []
  }
}
myself@penguin:~/Downloads$ 
foo4thought commented 1 week ago

myself@penguin:~/Downloads/mainepublicbroadcasting$ curl 'https://americanarchive.org/api.json?q=%28contributing_organizations:%22Maine%20Public%20Broadcasting%20Network%20%28ME%29%22%20AND%20access_types:digitized%29%20NOT%20%28access_types:online%29&rows=100&fl=id,xml&start=0' | jq -r '.' > 0.json myself@penguin:~/Downloads/mainepublicbroadcasting$ curl 'https://americanarchive.org/api.json?q=%28contributing_organizations:%22Maine%20Public%20Broadcasting%20Network%20%28ME%29%22%20AND%20access_types:digitized%29%20NOT%20%28access_types:online%29&rows=100&fl=id,xml&start=100' | jq -r '.' > 100.json myself@penguin:~/Downloads/mainepublicbroadcasting$ curl 'https://americanarchive.org/api.json?q=%28contributing_organizations:%22Maine%20Public%20Broadcasting%20Network%20%28ME%29%22%20AND%20access_types:digitized%29%20NOT%20%28access_types:online%29&rows=100&fl=id,xml&start=200' | jq -r '.' > 200.json myself@penguin:~/Downloads/mainepublicbroadcasting$ jq -r '.response.docs[].id' *.json > guids.txt myself@penguin:~/Downloads/mainepublicbroadcasting$ for guid in $(cat guids.txt);do echo "$(echo $guid;jq -r --arg guid "$guid" '[.response.docs[]|select(.id==$guid).xml][0]|select(. != null)' *.json | xpath -e '//pbcoreTitle' 2>/dev/null | tr -s '\n' '\t' | sed 's#># #g;s#<[^ ]* ##g')" | paste - - >> report.txt;done

foo4thought commented 1 week ago

report.txt