Airbyte Python CDK is a framework for building Airbyte API Source Connectors. It provides a set of classes and helpers that make it easy to build a connector against an HTTP API (REST, GraphQL, etc), or a generic Python source connector.
If you're looking to build a connector, we highly recommend that you
start with the Connector Builder.
It should be enough for 90% connectors out there. For more flexible and complex connectors, use the
low-code CDK and SourceDeclarativeManifest
.
If that doesn't work, then consider building on top of the lower-level Python CDK itself.
To get started on a Python CDK based connector or a low-code connector, you can generate a connector project from a template:
# from the repo root
cd airbyte-integrations/connector-templates/generator
./generate.sh
HTTP Connectors:
Python connectors using the bare-bones Source
abstraction:
This will generate a project with a type and a name of your choice and put it in
airbyte-integrations/connectors
. Open the directory with your connector in an editor and follow
the TODO
items.
Airbyte CDK code is within airbyte_cdk
directory. Here's a high level overview of what's inside:
connector_builder
. Internal wrapper that helps the Connector Builder platform run a declarative
manifest (low-code connector). You should not use this code directly. If you need to run a
SourceDeclarativeManifest
, take a look at
source-declarative-manifest
connector implementation instead.destinations
. Basic Destination connector support! If you're building a Destination connector in
Python, try that. Some of our vector DB destinations like destination-pinecone
are using that
code.models
expose airbyte_protocol.models
as a part of airbyte_cdk
package.sources/concurrent_source
is the Concurrent CDK implementation. It supports reading data from
streams concurrently per slice / partition, useful for connectors with high throughput and high
number of records.sources/declarative
is the low-code CDK. It works on top of Airbyte Python CDK, but provides a
declarative manifest language to define streams, operations, etc. This makes it easier to build
connectors without writing Python code.sources/file_based
is the CDK for file-based sources. Examples include S3, Azure, GCS, etc.Thank you for being interested in contributing to Airbyte Python CDK! Here are some guidelines to get you started:
Install the project dependencies and development tools:
poetry install --all-extras
Installing all extras is required to run the full suite of unit tests.
poetry run poe unit-test-with-cov
, or python -m pytest -s unit_tests
if you want
to pass pytest options.poetry run poe check-local
to lint all code, type-check modified code, and run unit tests
with coverage in one command.To see all available scripts, run poetry run poe
.
Low-code CDK models are generated from sources/declarative/declarative_component_schema.yaml
. If
the iteration you are working on includes changes to the models or the connector generator, you
might want to regenerate them. In order to do that, you can run:
poetry run poe build
This will generate the code generator docker image and the component manifest files based on the schemas and templates.
All tests are located in the unit_tests
directory. Run poetry run poe unit-test-with-cov
to run
them. This also presents a test coverage report. For faster iteration with no coverage report and
more options, python -m pytest -s unit_tests
is a good place to start.
When developing a new feature in the CDK, you may find it helpful to run a connector that uses that new feature. You can test this in one of two ways:
Open the connector's pyproject.toml
file and replace the line with airbyte_cdk
with the
following:
airbyte_cdk = { path = "../../../airbyte-cdk/python/airbyte_cdk", develop = true }
Then, running poetry update
should reinstall airbyte_cdk
from your local working directory.
Pre-requisite: Install the
airbyte-ci
CLI
You can build your connector image with the local CDK using
# from the airbytehq/airbyte base directory
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build
Note that the local CDK is injected at build time, so if you make changes, you will have to run the build command again to see them reflected.
Pre-requisite: Install the
airbyte-ci
CLI
To run acceptance tests for a single connectors using the local CDK, from the connector directory, run
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
There may be a time when you do not have access to the API (either because you don't have the credentials, network access, etc...) You will probably still want to do end-to-end testing at least once. In order to do so, you can emulate the server you would be reaching using a server stubbing tool.
For example, using mockserver, you can set up an expectation file like this:
{
"httpRequest": {
"method": "GET",
"path": "/data"
},
"httpResponse": {
"body": "{\"data\": [{\"record_key\": 1}, {\"record_key\": 2}]}"
}
}
Assuming this file has been created at secrets/mock_server_config/expectations.json
, running the
following command will allow to match any requests on path /data
to return the response defined in
the expectation file:
docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0
HTTP requests to localhost:8113/data
should now return the body defined in the expectations file.
To test this, the implementer either has to change the code which defines the base URL for Python
source or update the url_base
from low-code. With the Connector Builder running in docker, you
will have to use domain host.docker.internal
instead of localhost
as the requests are executed
within docker.
Python CDK has a
GitHub workflow
that manages the CDK changelog, making a new release for airbyte_cdk
, publishing it to PyPI, and
then making a commit to update (and subsequently auto-release)
source-declarative-manifest
and Connector Builder (in the platform repository).
[!Note]: The workflow will handle the
CHANGELOG.md
entry for you. You should not add changelog lines in your PRs to the CDK itself.[!Warning]: The workflow bumps version on it's own, please don't change the CDK version in
pyproject.toml
manually.
master
branch.Publish CDK Manually
workflow from master using release-type=major|manor|patch
and setting the changelog message.source-declarative-manifest
according to the
release-type
of the CDK, then commit these changes back to master. The commit to master will
kick off a publish of the new version of source-declarative-manifest
.airbyte-platform-internal
repo to bump the
dependency in Connector Builder.