MeltanoLabs / Singer-Working-Group

Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
Apache License 2.0
13 stars 4 forks source link

Should we publish a JSON Schema for `catalog.json` to Schema Store or similar #27

Open aaronsteers opened 2 years ago

aaronsteers commented 2 years ago

This could be helpful for users debugging or modifying their own catalog files, and also as a CI test to make sure taps are generating catalog files compliant with the spec.

Schema Store schemas may be automatically identified and applied in some IDEs (VS Code is one) for linting and auto-complete.

pnadolny13 commented 2 years ago

@aaronsteers I'm not sure if this is already part of the utils that parse the catalog input but if not this could also be recommended as a best practice for taps to do a schema validation prior to execution. It might help avoid confusing errors related to missing keys if someone was manually updating their catalog.

aaronsteers commented 2 years ago

@pnadolny13 - Agreed. It might be a warning-level (non-fatal) message, but I agree the util libs could and probably should alarm in some way if a passed catalog.json does not meet the published JSON Schema for that file.

aaronsteers commented 2 years ago

Posting back a quick update - we had some great success publishing JSON Schemas for our own Meltano yaml formats, and it was pretty painless to get those added to SchemaStore.org.

For ref:

A really nice benefit is that VS Code (and other editors) then automatically provided hover-text with help tips, lint warnings (red squigleys), and auto-complete support.

I don't know if this is too broad, but in theory would register Catalog files with a naming rule like *catalog.json and I think this could be a big help for users who have to modify catalog files manually.

For STDOUT/STDIN schemas like RECORD and SCHEMA messages, the naming rule is less relevant, since those are not "files" in the same sense.