GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
600 stars 95 forks source link

Give user full harvest error report #4701

Open FuhuXia opened 5 months ago

FuhuXia commented 5 months ago

We have received multiple requests from agencies to access full harvest error reports. ckanext-harvest only include top 20 errors in the report. Agency users complain 20 is too few when they are working on fixing large sources with hundreds of errors.

There are three ways to accomplish this.

  1. Add agency users as organization Editors. Pros: Editor users can log in to the web UI to read full report in HTML format; Tech-savvy user can create token and fetch full report in json format. Cons: More effort to manage users. Editor role comes with other dataset privileges that users are not supposed to have.
  2. Add full report to Metrics Dashboard in daily basis. Pros: Low effort. Public accessible. Cons: Public accessible. Not real-time. User might have to wait up to 24 hours to get the report matching the email notification.
  3. Attach full report in the email notifications. Pros: Low effort. No security concerns. Cons: Wait for upstream to approve the new feature, or stay on forked branch.
gujral-rei commented 5 months ago

The team is leaning towards # 3

FuhuXia commented 5 months ago

Document the steps the get full job report programmingly for registered catalog.data.gov user.

  1. Create a token at /user/[YOUR-USER-NAME]/api-tokens.

  2. Get the last job id. Go to https://catalog.data.gov/api/action/harvest_source_show?id=[YOUR-HARVEST-SOURCE], get the last_job id. Using command line with curl and jq installed, can be done: curl -s https://catalog.data.gov/api/action/harvest_source_show?id=[YOUR-HARVEST-SOURCE] | jq '.result.status.last_job.id'

  3. Download json report at https://catalog-prod-admin-datagov.app.cloud.gov/api/action/harvest_job_report?id=[LAST-JOB-ID] Using command line this can be done as: curl -H "Authorization: [YOUR-API-TOKEN]" "https://catalog-prod-admin-datagov.app.cloud.gov/api/action/harvest_job_report?id=[LAST-JOB-ID]"