csse-uoft / ckanext-udc

GNU Affero General Public License v3.0
1 stars 0 forks source link

Configure API to import data from another portal - phase 1 #69

Open bgajdero opened 1 year ago

bgajdero commented 1 year ago

This project is Phase 1 of the API import project. It will result in an API-Configuration file that can be customized for any end-point. It will reuse bulk upload and import script code.

  1. define API for importing from another CKAN instance
  2. define mappings of fields
  3. define access auth requirements
  4. Import-start function, first version will have a button on the Config page. Eventually we will use s timer to automatically do it.
  5. define end-point fields:
    • url
    • architecture (CKAN, Socrata, Dataverse, etc)
    • auth tokens required
    • metadata mapping file
    • catalogue selection criteria (either list of catalogue entries or one config per entry)
    • auxiliary data from additional point, such as quality metrics

Start with City of Toronto Open Data Portal: There are a lot of ways to configure these datastore_search calls – more info here: https://docs.ckan.org/en/2.9/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_search

There are lots more API endpoints you can call that are documented here: https://docs.ckan.org/en/2.9/api/

API END POINTS

List packages To get a list of all package names from our CKAN instance: https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_list

List Resources

(for reference, https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/ is the base URL for 99% of the API endpoints you’ll hit on our portal)

Show Package Package_show will return a JSON object containing high level information about the data on this page (the data owner, the last refreshed date, associated topics and civic issues, etc). This JSON will also contain high level information for each “resource” on this page. A “resource” is one concrete data thing (like a file, or a database table), and its object this API response will contain information you’ll need to grab its contents.

Accessing Aux Data To get records from a dataset, like the data quality scoring dataset:

  1. get the package metadata: https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show?id=catalogue-quality-scores

  2. get the “id” from the resource names quality-scores-explanation-codes-and-scores and plug it into the below datastore_search API call: https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/datastore_search?id=6d999ad7-d83c-4515-afc7-cae7ea85a1a8

  3. Each record in the “records” sub-object in the response should be a row in the spreadsheet I showed you today. You can match its package_name and resource_name attributes to a package and resource from a package_show call

bgajdero commented 7 months ago

For custom metadata processing, add a graph mapping configuration function.