GSA / enterprise-data-inventory

The Enterprise Data Inventory is a CKAN based data management system for private and public data management
7 stars 5 forks source link

Massive error log since switch to v. 1.1 #156

Closed bletalien closed 9 years ago

bletalien commented 9 years ago

This started before I made any changes. Here are the first few lines:

2014-12-04 15:07:53,372 - Validation failed, best guess of error = u'description' is a required property

Failed validating u'required' in schema: {u'$schema': u'http://json-schema.org/draft-04/schema#', u'definitions': {u'distribution': {u'$schema': u'http://json-schema.org/draft-04/schema#', u'dependencies': {u'downloadURL': {u'properties': {u'mediaType': {u'pattern': u'^[-\w]+/[-\w]+(.[-\w]+)*([+][-\w]+)?$', u'type': u'string'}}, u'required': [u'mediaType']}}, u'description': u'Validates an entire collection of common core metadata JSON objects. Agencies produce said collections in the form of Data.json files.', u'id': u'https://project-open-data.cio.gov/v1.1/schema/distribution.json#', u'properties': {u'@type': {u'description': u'IRI for the JSON-LD data type. This should be dcat:Distribution for each Distribution', u'enum': [u'dcat:Distribution'], u'title': u'Metadata Context'}, u'accessURL': {u'anyOf': [{u'format': u'uri', u'type': u'string'}, {u'type': u'null'}], u'description': u'URL providing indirect access to a dataset',

bletalien commented 9 years ago

I went through the error log file and found that in most instances, it claimed the entries were missing descriptions. Of course they weren't, but I edited each one (deleted the final period if it was there and added one if there wasn't one -- simply hitting "manage" and saving the entry without making any changes didn't do the trick). A few were stubborn, but most cleared up on the first shot. In a couple of cases, it seems not to have handled frequency well. For example, I had to remove "semiweekly" from one of the more stubborn entries. The error log singled out not only entries added months ago, but also at least one entry I added yesterday or today. I noticed no similarity that might explain why some entries produced errors and others didn't. I had 186 before today and have 256 today (working on splitting them up into parents and children), and there were 30+ "errors," only one of which was legitimate.

kvuppala commented 9 years ago

@bletalien the error log now is made much shorter and the issue with description should be resolved, the description missing error was related to imported records through api than the ones created through UI. Let us know if you see any issues on this.

bletalien commented 9 years ago

I've used the UI exclusively, never tried to import records via API, but in any case, those error messages have stopped.

Now I'm getting errors like the following and it's stripping non-public assets from the EDI:

2014-12-19 09:55:43,552 - ('Possible Private Data Leakage', ['A dataset appears with accessLevel set to "non-public". (1 locations)']) 2014-12-19 09:55:43,553 - Dataset id=[09705b0f-50c1-4334-a536-bed1636bbab2], title=[I removed the title for GitHub] omitted

Thanks!

kvuppala commented 9 years ago

Will take look at EDI listing, it should not have given the error, which organization you are using for export:

bletalien commented 9 years ago

Thanks. https://inventory.data.gov/organization/office-of-personnel-management

kvuppala commented 9 years ago

@bletalien The issue is resolved, thanks for the catch.

bletalien commented 9 years ago

@kvuppala, thank you!