airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
216 stars 31 forks source link

Support config write-back #8

Open flash1293 opened 8 months ago

flash1293 commented 8 months ago

Via an Airbyte control message, the running connector can issue an update of its config object: https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#airbytecontrolmessage

This is important for cases like single-use authentication tokens - some APIs only accept an authentication token once and return a new token in the response which has to be used the next time, invalidating the old token.

This message type is currently not honored by airbyte-lib - the message is silently dropped. Ideally, it's possible to handle this situation gracefully:

The last step is important to make it possible to run a proper check and read command as part of the CI steps - currently it's not possible to do this as integration test secrets are at risk of being invalidated by the test being run: https://github.com/airbytehq/airbyte/pull/34044

flash1293 commented 8 months ago

@aaronsteers we talked about this before - there's another situation where the config update needs to be handled and that's the CI check that's run for airbyte-lib enabled connectors.

aaronsteers commented 8 months ago

Made a request in slack to get more detail on scope here and which connectors are affected:

To help us prioritize when/if to build support for the CONTROL message in AirbytLib as documented here, I'm looking for a list of python source connectors which require what I'll call "config write-back". My understanding is that this is primarily for replacing retired refresh_token secret values. Does anyone know how to locate those connectors that require this feature, and/or know of a few which I can use to better analyze/understand the technical requirements?

aaronsteers commented 8 months ago

Aggregating responses from Slack thread, it looks like we have at least these that require the feature:

And a way to search for them would be to look for instances of refresh_token_updater and/or SingleUseRefreshTokenOauth2Authenticator.

Thanks to @flash1293, @pedroslopez, and @alafanechere for their expertise here. 🙏

andreibaragan commented 5 months ago

source-quickbooks is another. According to Quickbooks docs, they update the value of the refresh_token value every 24 hours, or the next time you refresh the access tokens after 24 hours.

aaronsteers commented 5 months ago

@andreibaragan - Thanks for raising. I've updated my comment above to include quickbooks as well.

For our information and prioritization, are you blocked by this or do you have a workaround?

andreibaragan commented 5 months ago

@aaronsteers this is blocking.

andreibaragan commented 2 months ago

@aaronsteers is there any update/plan/alternative to this issue?

aaronsteers commented 1 month ago

@andreibaragan - We have not made any progress on this and it is not currently prioritized. What we can do in the meanwhile is to try to work out a spec, which then Airbyte or a member of the community could pick up.

To keep this conversation moving, let me put some implementation details here in this thread, according to my research and current understanding of the feature requirements.

For context and background: Unlike Airbyte Platform, PyAirbyte has no write access to your secret store - nor do we really want PyAirbyte to need this access. PyAirbyte also does not (yet) have a file-based interface for reading config information. Meaning, there's no way for PyAirbyte to handle the new creds that it would receive from the AirbyteControlMessage.

A few options:

  1. Add Callback.
    • The get_source() method could accept something like a control_message_callback or (more specifically) config_change_callback which would be called by PyAirbyte whenever a control message is received from the source, aka when the config is being attempted to be changed.
    • The user would have full control to handle this config change when/if a message is received.
  2. Add file-based config option.
    • The get_source() could accept a config file where it currently expects a config dictionary.
    • PyAirbyte would write config changes to the file when/if the control message from the connector instructs it to.
    • PyAirbyte would print to the logs, but otherwise the process would be invisible to the user.
  3. No change to function signatures, except treat config as okay-to-modify.
    • Same as option 2, but we write config changes back to the dictionary.
    • Similarly, we print to the logs, but otherwise the process is invisible to the user.

With options 2 or 3, we don't change much, but we also don't have any explicit handoff that the config change is received by the user or handled appropriate. If the process crashes, the python dictionary's contents are lost. And in ephemeral environments, the local config file is likewise very likely to be dropped at the end of the sync.

Given the above prelim spec exploration, I think I slightly prefer using option 1, which would be to add an option for an explicit callback.

Lmk if this makes sense to you, or if you have any other ideas or suggestions based on your use case.

Once we have a viable spec, I can at least mark this as ready to work, which is a step in the right direction, I think.