At CFPB, we noticed that out data.json contained invalid JSON; it had trailing , at the end of a couple of arrays and objects.
When running it through the validator, the validator reports that this is valid JSON. However, the automated metrics report it as invalid, and the datasets contained within it are not updated.
Here's a snippet of one of the datasets with trailing commas that make them invalid. I can currently paste this into the validator and it reports "Valid JSON" as "true", but https://jsonlint.com/ fails on the trailing commas.
{
"@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld",
"@type": "dcat:Catalog",
"conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
"describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
"dataset": [
{
"@type": "dcat:Dataset",
"accessLevel": "public",
"accrualPeriodicity": "R/P1D",
"bureauCode": [
"581:00"
],
"contactPoint": {
"@type": "vcard:Contact",
"fn": "devops@cfpb.gov",
"hasEmail": "mailto:devops@cfpb.gov"
},
"description": "Prepaid account agreement data, which contain general terms and conditions, pricing, and fee information, that issuers submit to the Bureau under the terms of the Prepaid Rule. Data is refreshed nightly.",
"distribution": [
{
"@type": "dcat:Distribution",
"downloadURL": "https://files.consumerfinance.gov/a/assets/prepaid-agreements/prepaid_metadata_all_agreements.csv",
"mediaType": "text/csv"
}
],
"identifier": "PPAD",
"keyword": [
"prepaid",
"product",
"type",
"agreement",
],
"landingPage": "https://www.consumerfinance.gov/data-research/prepaid-accounts/search-agreements/",
"modified": "2021-09-21",
"programCode": [
"000:000"
],
"publisher": {
"@type": "org:Organization",
"name": "Consumer Financial Protection Bureau"
},
"spatial": "United States",
"title": "Prepaid Product Agreements Database",
},
]
}
It looks like there's an expectation that if JsonStreamingParser doesn't fail to parse, it's valid JSON (which seems reasonable to me), and as far as I can tell, this problem comes from the fact that it doesn't.
So, the fact that invalid JSON gets parsed successfully seems like a upstream problem. However we have folks who aren't developers who rely on the validator to ensure that the data.json is both valid JSON and valid per the federal-v1.1 schema. It would be good if the validator would report a result consistent with the actual parsing of data.json for data.gov — in whatever way that might best be achieved.
At CFPB, we noticed that out data.json contained invalid JSON; it had trailing
,
at the end of a couple of arrays and objects.When running it through the validator, the validator reports that this is valid JSON. However, the automated metrics report it as invalid, and the datasets contained within it are not updated.
Here's a snippet of one of the datasets with trailing commas that make them invalid. I can currently paste this into the validator and it reports "Valid JSON" as "true", but https://jsonlint.com/ fails on the trailing commas.
It looks like there's an expectation that if
JsonStreamingParser
doesn't fail to parse, it's valid JSON (which seems reasonable to me), and as far as I can tell, this problem comes from the fact that it doesn't.Indeed, looking at the source, it appears that the problem comes from
JsonStreamingParser
ending an array/object independently from checking what comes after a,
(I wish this was how JSON worked).So, the fact that invalid JSON gets parsed successfully seems like a upstream problem. However we have folks who aren't developers who rely on the validator to ensure that the data.json is both valid JSON and valid per the federal-v1.1 schema. It would be good if the validator would report a result consistent with the actual parsing of data.json for data.gov — in whatever way that might best be achieved.