data-lessons / library-python-intro-superseded

Other
1 stars 1 forks source link

Library tasks for Python #1

Open weaverbel opened 7 years ago

weaverbel commented 7 years ago
  1. Extract repository data, clean it up, and re-import it
  2. Scrape ARC and NHMRC reports for grants awarded to home institution and build contact list from that
  3. Extract bibliographic records from a repository and identify authors lacking ORCIDs
richyvk commented 7 years ago

Above are the Brisbane suggestions for an overarching 'superpower' example to use throughout the lesson.

richyvk commented 7 years ago

Also, have the idea of analysing ezyproxy logs

prcollingwood commented 7 years ago

https://github.com/prcollingwood/ezproxy

jduckles commented 7 years ago

Some comments I've scribed from the room in Otago:

  1. Would this be possible to do for hundreds/thousands of learners? The re-import step seems like it wouldn't scale.
  2. Seems very country specific. Could we make it more generic?
  3. Would all workshop learners have access to the ORCID API in order to run these checks?
weaverbel commented 7 years ago

The aim of my point 1. is that libraries often have publications repositories where bibliographic records need to be standardised/cleaned up/linked for reporting-to-government exercises like Excellence in Research Australia. So if we developed a workflow that could do that, most academic libraries would have a use for that. Ditto point 2. A lot of libraries want to identify grant awardees annually so they can target them to create data management plans. @jduckles

weaverbel commented 7 years ago

Actually @jduckles point 2. would probably work better in the web scraping lesson. And we could tell people to just plug in their own grant-making bodies - or we could do US/UK/ Australia/NZ etc and people teach what bits they want.

libADS commented 7 years ago

One idea I have discussed with @jduckles at Otago, was to parse a series of BibTex records (from Scopus) and to use the oadoi API to find out which articles are open access.

Jonah mentioned that we should not rely on an API, but we could use prefetched JSON data from oadoi to show how this could be done.

This would showcase:

  1. How to parse BibTex (and install + import an external library bibtexparser)
  2. How to loop through a sequence of records
  3. How to make an HTTP request to an external API (but see caveat above)
  4. How to parse JSON
libADS commented 7 years ago

Another idea I had was to handle XML data (which is in itself useful because MARC, MODS, METS, DC etc...). Maybe this XML is coming from an OAI-PMH provider (so we also need to show how to make an HTTP request). For instance there is a feature in OAI-PMH called "selective harvesting" which only returns records that have been modified with a specific date range, this can be useful if you want to synchronize duplicate metadata between two repositories.