GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
616 stars 98 forks source link

Publisher hierarchy is not easily queryable in catalog #2163

Open adborden opened 4 years ago

adborden commented 4 years ago

User Story

In order to easily filter results based on publisher organizations, open data user wants the publisher organization hierarchy captured in extras so they can easily be queried.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

Background

Inventory and Catalog treat publisher differently, despite both using the same ckanext-datajson extension. For example, the [data.gov ckan API dataset]() in Inventory has a distinct hierarchy of publishers:

$ curl --silent -H "Authorization: $CKAN_API_KEY" "https://inventory.data.gov/api/action/package_show?id=data-gov-ckan-api"  | jq '.result.extras[] | select(.key | test("publisher"))'
{
  "key": "publisher",
  "value": "General Services Administration"
}
{
  "key": "publisher_1",
  "value": "Technology Transformation Service"
}
{
  "key": "publisher_2",
  "value": "Data.gov"
}

But in Catalog, only the child-most or "leaf" publisher exists.

$ curl --silent -H "Authorization: $CKAN_API_KEY" "https://catalog.data.gov/api/action/package_show?id=data-gov-ckan-api"  | jq '.result.extras[] | select(.key | test("publisher"))'
{
  "key": "publisher",
  "value": "Data.gov"
}
{
  "key": "publisher_hierarchy",
  "value": "General Services Administration > Technology Transformation Service > Data.gov"
}

In Inventory, you can do things like find all datasets of TTS, even if the publisher leaf might be different. curl -v -H "Authorization: $CKAN_API_KEY" 'https://inventory.data.gov/api/action/package_search?fq=publisher_1:"Technology+Transformation+Service"'

Note: the above query returns no results in Inventory, I'm not sure if that's a separate issue or something about how the query is parsed.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

jbrown-xentity commented 2 days ago

While this is a bit in the weeds on data, redesigning catalog to think of publisher hierarchies would be helpful. Tagging @hkdctol , @CarolinaC-REI , @tdlowden for review.

hkdctol commented 2 days ago

yes, good to uncover for the catalog design work