Closed dmfenton closed 9 years ago
@dmfenton You can use the validator at http://labs.data.gov/dashboard/validate
Here's a direct link to your results: http://labs.data.gov/dashboard/validate?schema=federal&output=browser&datajson_url=https%3A%2F%2Fgist.githubusercontent.com%2Fdmfenton%2F6bc934df2e3bc395684c%2Fraw%2Fd7b0fb6e49e62df2a77bc2112b808a6687077451%2Fdata.json&qa=true
Note that this uses the v1.0 schema which we are phasing out over the next month. All CFO Act agencies are required to transition from v1.0 to v1.1 by February 1st 2015. You can see the v1.1 documentation at https://project-open-data.cio.gov/v1.1/schema/ and if you have questions or issues about it, feel free to raise those at https://github.com/project-open-data/project-open-data.github.io/issues
Hi Phil, so as long as the dcat validates formally you would feel comfortable bringing it into Data.gov?
As far as terminology, the Project Open Data (POD) Schema is based on DCAT, but I think it's really only the the v1.1 version of the POD schema that includes an accurate serialization of DCAT, however the POD schema also includes some required fields and field validation that extends and constrains DCAT, so I try to avoid referring to it as DCAT.
The data.gov harvester is strict about following the POD schema validation and uses a JSON schema (here's the v1.1) to do that (the same JSON schema is used for the validator I linked to before). This means that only datasets that meet the validator requirements will be included on data.gov.
The validator doesn't require unique values for the contact email/name as it does for some fields like identifier, so the validation process won't prevent duplicate values being used, but it does go against the intent of the policy. In other words, it should validate, but it's not good. We're starting to expand some quality assurance analysis on this metadata, so things like this will likely be flagged in the future, but right now we're not checking for it.
The license field is not currently required for federal agencies, but is strongly encouraged. In the past we've required it for non federal entities that want to be included on data.gov, but we've started to relax this. In the future, it's likely there will be a more strict requirement for this to always be included, but that's not the case now.
Again, I should emphasize that data.gov will soon no longer suppport the 1.0 version of the schema as the file you provided was using. The OMB deadline for federal agencies to transition to v1.1 is February 1st.
Also note that while OMB requires all metadata to be provided with the POD schema as a data.json file, for data.gov's purposes we defer to geospatial metadata standards and harvest sources (ISO 19115 and CSW) when it's available. For more details on the distinction see http://www.digitalgov.gov/resources/how-to-get-your-open-data-on-data-gov/
Phil, thank you so much. This was exactly the information we need. We are looking forward to helping our state and local government customers contribute many thousands of new datasets to data.gov. We are striving to make sure that everything has correct (and unique contact information) as well as a license, and we will continue to improve that over time.
For any existing geospatial metadata, you may also be interested in updated guidance and crosswalks for the POD v1.1 schema from CSDGM and ISO standards. This hasn't been fully published yet, but you can see a preview at http://pod-preview.civicagency.org/v1.1/metadata-resources/#crosswalks-for-geospatial-metadata
Hey @philipashlock, would you be willing to take a look at this DCAT sample produced by an ArcGIS Open Data site and let me know if it's admissible?
https://gist.github.com/dmfenton/6bc934df2e3bc395684c
Note: I put dummy information in for the contact point and email address, but assume they would be valid.
Thanks.
cc @ajturner