GSA / datagov-ckan-multi

Other
10 stars 6 forks source link

Analyze what to do with the non-utf8 harvest sources #113

Closed avdata99 closed 5 years ago

avdata99 commented 5 years ago

Check the non-utf8 sources:

Tasks

Analysis

Read these notes: https://gitlab.com/datopian/ckan-ng-harvest/issues/35#note_201314023

avdata99 commented 5 years ago

If we do not have a correct character set in the headings, we will notify @adborden to follow up with the owner of data.json to correct it

thejuliekramer commented 5 years ago

@adborden These datasets do not have a content type or the charset in the header

thejuliekramer commented 5 years ago

@adborden these have the content type and charset but are not valid json

how should we treat these cases? skip and notify the admins?

adborden commented 5 years ago

Thanks for looking into these. Yes, record the error, skip, and notify.

hkdctol commented 5 years ago

We are going through harvest sources/doing some clean up. Some of the ones you're looking at are outdated. Defense - they will be restarting their open data effort. The whitehouse.gov one is obsolete, as is the VA one. If you need information from the other agencies we can track that down.

thejuliekramer commented 5 years ago

When the data source is in an invalid format we are skipping the validation/import process and adding these errors to the main list of errors that will be emailed to admins.