GSA / project-open-data-dashboard

Project Open Data Dashboard
http://labs.data.gov/dashboard/
Other
137 stars 119 forks source link

Validator reports "Valid JSON" on invalid JSON that fails to parse. #370

Open willbarton opened 2 years ago

willbarton commented 2 years ago

At CFPB, we noticed that out data.json contained invalid JSON; it had trailing , at the end of a couple of arrays and objects.

When running it through the validator, the validator reports that this is valid JSON. However, the automated metrics report it as invalid, and the datasets contained within it are not updated.

image

Here's a snippet of one of the datasets with trailing commas that make them invalid. I can currently paste this into the validator and it reports "Valid JSON" as "true", but https://jsonlint.com/ fails on the trailing commas.

{
    "@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld",
    "@type": "dcat:Catalog",
    "conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
    "describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
    "dataset": [
        {

            "@type": "dcat:Dataset",
            "accessLevel": "public",
            "accrualPeriodicity": "R/P1D",
            "bureauCode": [
                "581:00"
            ],
            "contactPoint": {
                "@type": "vcard:Contact",
                "fn": "devops@cfpb.gov",
                "hasEmail": "mailto:devops@cfpb.gov"
            },
            "description": "Prepaid account agreement data, which contain general terms and conditions, pricing, and fee information, that issuers submit to the Bureau under the terms of the Prepaid Rule.  Data is refreshed nightly.",
            "distribution": [
                {
                    "@type": "dcat:Distribution",
                    "downloadURL": "https://files.consumerfinance.gov/a/assets/prepaid-agreements/prepaid_metadata_all_agreements.csv",
                    "mediaType": "text/csv"
                }
            ],
            "identifier": "PPAD",
            "keyword": [
                "prepaid",
                "product",
                "type",
                "agreement",
            ],
            "landingPage": "https://www.consumerfinance.gov/data-research/prepaid-accounts/search-agreements/",
            "modified": "2021-09-21",
            "programCode": [
                "000:000"
            ],
            "publisher": {
                "@type": "org:Organization",
                "name": "Consumer Financial Protection Bureau"
            },
            "spatial": "United States",
            "title": "Prepaid Product Agreements Database",
        },
    ]
}

It looks like there's an expectation that if JsonStreamingParser doesn't fail to parse, it's valid JSON (which seems reasonable to me), and as far as I can tell, this problem comes from the fact that it doesn't.

Indeed, looking at the source, it appears that the problem comes from JsonStreamingParser ending an array/object independently from checking what comes after a , (I wish this was how JSON worked).


So, the fact that invalid JSON gets parsed successfully seems like a upstream problem. However we have folks who aren't developers who rely on the validator to ensure that the data.json is both valid JSON and valid per the federal-v1.1 schema. It would be good if the validator would report a result consistent with the actual parsing of data.json for data.gov — in whatever way that might best be achieved.

hkdctol commented 2 years ago

Thanks for letting us know. We're in the process of replacing the dashboard with a metrics dataset. We hope to get to it soon.