MeltanoLabs / Singer-Working-Group

Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
Apache License 2.0
13 stars 4 forks source link

Aggregating current best practices for existing "Standard" features #10

Open dmosorast opened 2 years ago

dmosorast commented 2 years ago

This came to mind today as I was considering the effect of currently_syncing, this is a standard top-level state key (via singer-python) that allows a tap to resume from the last stream it was syncing upon interruption. Just setting it and skipping streams on resume until you get to the correct stream is fine, but there are some best practices here. E.g., sort streams into an expected order always, reshuffle streams instead of skipping so that all streams will get attempted per successful run, etc.

In what I'm currently thinking of as the Singer "Standard" (a level above the spec where discovery, metadata, catalogs, etc. live), I'm sure there are a lot more semantic usages of standard metadata and/or other state keys that should be understood and documented in a discoverable location.

dmosorast commented 2 years ago

Going to gather things that come to mind into a list here. Maybe some of these topics are more for discussion, but it seems worth it to have them in a single spot, at least.

  1. Currently Syncing feature
  2. Specifics of "standard" versus "custom" metadata -> e.g., the usage of selected-by-default versus tap_foo.arbitrary_metadata_from_discovery questions of: snake versus kebab case? etc.
  3. Tap executable naming convention (everything in the singer-io repo is tap-* but I've seen varying usages, is it an important standard?)
  4. Should SCHEMA messages be filtered via field selection?
  5. Discovery mode should test the credentials for immediate feedback of improper configuration