SchemaStore / schemastore

A collection of JSON schema files including full API
http://schemastore.org
Apache License 2.0
3.01k stars 1.64k forks source link

Multiple broken schema URLs in the catalog #2247

Open ssbarnea opened 2 years ago

ssbarnea commented 2 years ago

Area with issue?

JSON Schema

✔️ Expected Behavior

Apparently we have more than one schema with broken URLs:

Failed to cache https://raw.githubusercontent.com/dolittle/DotNET.Fundamentals/master/Schemas/Tenancy.Configuration/tenant-map.json Failed to cache https://raw.githubusercontent.com/dolittle/DotNET.SDK/master/Schemas/Applications.Configuration/topology.json Failed to cache https://www.facets.cloud/assets/fsdl/application.schema.json Failed to cache https://raw.githubusercontent.com/fossas/fossa-cli/master/docs/references/files/fossa-deps.schema.json Failed to cache https://hazelcast.com/schema/config/hazelcast-config-5.1.json Failed to cache https://raw.githubusercontent.com/blackbaud/skyux-config/master/skyuxconfig-schema.json Failed to cache https://raw.githubusercontent.com/DotNetAnalyzers/StyleCopAnalyzers/master/StyleCop.Analyzers/StyleCop.Analyzers/Settings/stylecop.schema.json Failed to cache https://raw.githubusercontent.com/dotnet/Nerdbank.GitVersioning/master/src/NerdBank.GitVersioning/version.schema.json Failed to cache https://webcomponents.dev/assets2/schemas/studio.config.json

❌ Actual Behavior

We should not have broken URL on our catalog. While I suppose that downtimes are possible, I doubt that these are only temporary hiccups.

Probably we need scheduled pipeline to check URLs for being valid and maybe even checking that the schemas are passing validation too.

I am not sure what we should do about this, first we would need to ensure we record contacts for each externally hosted schema, so we would know whom to contact.

Removing broken entries is easy, but often that is not a fix.

YAML or JSON file that does not work.

No response

IDE or code editor.

Visual Studio Code

Are you making a PR for this?

No, someone else must create the PR.

aaronsteers commented 1 year ago

What if contributors were required to provide a github repo for purposes of receiving automated issues created by a daily CI bot? This could be added to the existing schema index as issueTracker or similar, with required or suggested entries like http://www.github.com/{org}/{repo}/issues.

A daily automated CI job here in the SchemaStore repo could check for broken links and perhaps also check for malformed JSON, creating an issue in the provider's repo if an outage is detected.

Another option one might consider would be to provide an email address for notification, but repo issues have a number of benefits, notably that they can be shared freely without fear of inviting spam/solicitations/phishing, and they are self-documenting. If an issue is created and not responded to in n days (maybe 10 or 14), then the problematic schemas or schema URLs can be removed with a link to the removal PR posted there in the issue. If the provider later resolves the issue, they can create a new PR basically with a "revert" version of the removal PR.

This initial proposal might favor GitHub repos, but pubic repos are free to create even if the code is hosted elsewhere. Future revisions could try to meaningfully support Gitlab or other types of trackers, but as a first iteration, automated creation of github issues would cover the significant majority of schema providers. We already know 100% of submitters have a github account, since otherwise they would not have been able to contribute a PR here to the SchemaStore repo.

Important to note: 6 of the 9 broken links referenced in the issue body are raw.githubusercontent.com references, meaning we can automatically infer the target repo with which to log issues even for those where no explicit repo for issues is ever provided. The remaining three are vendor ".com" addresses, so they might be GitHub hosted also, but we can't tell from the URL.

To maintainers here:

What do you think about something like the above? We at Meltano were just considering if we should create our own CI flow to test if our URLs ever become broken, but thought we should ping here first to see if there's appetite to do something centrally managed/maintained.