Open aaronsteers opened 2 years ago
@pandemicsyn has graciously offered to draft a proposal doc for this item. 🙌
Hello, I really need this and plan to implement my own version of it in my tap. Where is the current status of this issue?
We have desire to run a tap in "batch" mode to dump records as well as "streaming" (or "tail") mode which listens for new events and pushes them to the target accordingly.
Question, is there any reason the singer wrappers don't expose a simple optional REST interface (think management backplane) for start
, stop
, shutdown
etc. instead of using signals which can get dicey depending on language/context?
So when a tap starts it also starts a simple REST interface that allows you to control the tap by sending simple JSON commands (messages, just like the spec does for RECORDs).
For streaming use cases, an always-on tap should be able to "tail" the data source and continually run the extract. This is possible today without changes to the spec, but there is not yet any guidance or "best practice" in terms of how to accomplish this.
A reference implementation would likely entail one or more of the following:
tail
(boolean) which, whentrue
would put the tap into an infinite loop of extract, sleep, extract, and so on.state_message_min_cadence
setting could be established, which forces a state message to flush at least every 1-5 minutes, for instance, assuming 1 or more records have been sent since the last STATE message.new_record_polling_interval
setting could indicated how much time to sleep between calls to check for new records.tail
mode, we'll want to have amax_full_table_age
or similar, which would drive new extracts of streams pulled withFULL_TABLE
after the set amount of time has elapsed.SIGTERM
message (beforeSIGKILL
) should immediately trigger a flush the latest STATE message. In an ideal scenario, sending the termination command (likectrl+c
) should give the tap a few seconds to send on the last viable state bookmark and give an optimal chance of not having to repeat records on a subsequent execution.Note: While the priority here is probably the
tail
setting, it should be fair to assume that a method that needs to run indefinitely should also be ready to be interrupted at any point; since the only way to "stop" the stream is to kill the process, we will need to be ready for interruption and plan for how STATE tracking will be affected by the worst-case scenario interruption timing.