airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
178 stars 20 forks source link

Feat: Add experimental support for low-code source execution via manifest YAML #175

Closed aaronsteers closed 1 month ago

aaronsteers commented 3 months ago

This adds the ability to run (in theory) 130 declarative yaml sources in PyAirbyte, without any need for additional virtual environment isolation. The manifest.yml file content can be provided by the user or auto-downloaded from master branch of airbytehq/airbyte.

Thanks to @bnchrch and @lmossman for helping figure out the logic.

The get_source() implementation in airbyte.experimental includes a new source_manifest input argument.

The argument can be any of these types:

The Yaml-runnable connectors can be found using ab.get_available_connectors(install_type="yaml") or ab.get_available_connectors(InstallType.YAML)

This PR also adds hard-coded exclusions for connectors in three categories:

Usage example

See the 2 new scripts in the examples directory for more examples, but the simplest usage is just:

from airbyte.experimental import get_source

source = get_source(
    "source-pokeapi",
    config={
        "pokemon_name": "ditto",
    },
    source_manifest=True,
)
source.check()
source.select_all_streams()

result = source.read()

In the above example, the source manifest.yml is automatically located from master branch of airbytehq/airbyte, and the only change from the user perspective is to add the arg source_manifest=True.

Note

Included Connectors

This is the result of calling get_available_connectors("yaml"):

Show/Hide

``` - source-activecampaign - source-aha - source-aircall - source-appfollow - source-apple-search-ads - source-ashby - source-auth0 - source-babelforce - source-breezometer - source-callrail - source-captain-data - source-chargify - source-chartmogul - source-clickup-api - source-clockify - source-coda - source-coin-api - source-coingecko-coins - source-coinmarketcap - source-configcat - source-confluence - source-convertkit - source-copper - source-datadog - source-datascope - source-delighted - source-dixa - source-dockerhub - source-dremio - source-drift - source-emailoctopus - source-exchange-rates - source-flexport - source-freshcaller - source-freshsales - source-freshservice - source-fullstory - source-gainsight-px - source-getlago - source-glassfrog - source-gocardless - source-gong - source-google-pagespeed-insights - source-google-webfonts - source-gutendex - source-harvest - source-hellobaton - source-hubplanner - source-insightly - source-intruder - source-ip2whois - source-k6-cloud - source-klarna - source-klaus-api - source-launchdarkly - source-lemlist - source-lever-hiring - source-lokalise - source-mailerlite - source-mailersend - source-mailgun - source-mailjet-mail - source-mailjet-sms - source-marketo - source-merge - source-metabase - source-microsoft-teams - source-n8n - source-nasa - source-news-api - source-newsdata - source-nytimes - source-omnisend - source-onesignal - source-open-exchange-rates - source-openweather - source-opsgenie - source-orbit - source-oura - source-pendo - source-persistiq - source-pexels-api - source-pivotal-tracker - source-plaid - source-plausible - source-pokeapi - source-polygon-stock-api - source-postmarkapp - source-primetric - source-punk-api - source-pypi - source-recreation - source-recruitee - source-reply-io - source-ringcentral - source-rocket-chat - source-sap-fieldglass - source-secoda - source-sendgrid - source-sendinblue - source-sentry - source-serpstat - source-smartengage - source-sonar-cloud - source-spacex-api - source-square - source-statuspage - source-strava - source-survey-sparrow - source-tempo - source-timely - source-tmdb - source-todoist - source-toggl - source-tvmaze-schedule - source-twilio-taskrouter - source-twitter - source-tyntec-sms - source-visma-economic - source-vitally - source-waiteraid - source-whisky-hunter - source-wikipedia-pageviews - source-workable - source-workramp - source-wrike - source-yahoo-finance-price - source-yotpo - source-zapier-supported-storage - source-zenefits ```

Hard-coded exclusions have been removed from this list, for instance, those low-code connectors that require one or more python code files.

Summary by CodeRabbit

aaronsteers commented 1 month ago

Some tests are failing because we had to remove source-faker as a dev dependency in order to get newer versions of the CDK to work. PR here bumps the CDK version in source-faker so we can bring it back as a dev dependency:

aaronsteers commented 1 month ago

@natikgadzhi, @erohmensing, @bindipankhudi, @alafanechere, @bnchrch - This is ready for your review.

Tests are all passing except Python 3.11 tests, which will be resolved soon via @natikgadzhi's CDK update here (just merged, pending release to PyPi): https://github.com/airbytehq/airbyte/pull/38846

aaronsteers commented 1 month ago

/fix-pr

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai[bot] commented 1 month ago

Walkthrough

The recent updates to the Airbyte module introduce new entities and functionalities, enhance existing modules, and add support for declarative YAML source testing. Key changes include adding the records entity, importing snowflakecortex, and integrating a base module in the caches. The sources have been significantly updated with new classes and methods for handling declarative sources. Additionally, new example files and updated tests ensure robust handling of connectors and sources.

Changes

Files Change Summaries
airbyte/__init__.py Added records entity; replaced experimental with records.
airbyte/_processors/sql/__init__.py Added import statement for snowflakecortex.
airbyte/caches/__init__.py Added import statement for base module.
airbyte/sources/declarative.py Introduced classes DeclarativeExecutor and DeclarativeSource for YAML sources.
airbyte/sources/registry.py Added imports, constants, Enums, attributes, and updated functions for connectors.
airbyte/sources/util.py Added imports, parameters, and logic for handling YAML manifest in _get_source.
examples/... Added run_declarative_manifest_source.py and run_downloadable_yaml_source.py.
pyproject.toml Updated dependency versions for airbyte-cdk and airbyte-source-faker.
tests/conftest.py Added imports, modified fixtures, and mocked registry behavior for testing.
tests/integration_tests/... Added and modified test functions for connectors and sources.
tests/unit_tests/... Added and modified test functions and fixtures for unit testing connectors.

Sequence Diagram(s) (Beta)

sequenceDiagram
    participant User
    participant Airbyte
    participant DeclarativeExecutor
    participant Source

    User->>Airbyte: Run declarative manifest source
    Airbyte->>DeclarativeExecutor: Initialize with manifest
    DeclarativeExecutor->>Source: Execute source with manifest
    Source-->>DeclarativeExecutor: Return data
    DeclarativeExecutor-->>Airbyte: Processed data
    Airbyte-->>User: Display data

Poem

In the realm of Airbyte's code,
New records and sources showed,
YAML manifests now take flight,
Bringing data to the light.
With tests and imports all aligned,
A seamless flow you'll surely find.
🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.