airbytehq / PyAirbyte-Hackathon

Tasks for PyAirbyte Hackathon June 2024
0 stars 2 forks source link

New Source Connector: DuckDB 🦆 #31

Open aaronsteers opened 5 months ago

aaronsteers commented 5 months ago

Overview

We do not yet have a DuckDB source connector. Normally, DuckDB database are local files and not very useful as sources, but now they can also be remote (e.g. MotherDuck) and they can be a pass-through for other datasource (e.g. #30 and the Hugging Face Datasets).

Technical spec

You would write a new source connector which can connect to a (remote) DuckDB dataset or database, and emit records from DuckDB, allowing Airbyte users to send these to any Airbyte destination.

Notes:

Definition of Done

aaronsteers commented 5 months ago

Assigning myself in order to reserve/hold for @ombhardwajj, who has the related #30.

ombhardwajj commented 5 months ago

Hey @aaronsteers ,you can assign it to me now!

marcosmarxm commented 4 months ago

@ombhardwajj what is the status of this issue?

ombhardwajj commented 4 months ago

@marcosmarxm Its already been a week since I am working on the Hugging Face Datasets connector. Given the time constraint of this hackathon, I don't think i'll be able to build this DuckDB connector. Hence I am un-assigning myself.

bala-ceg commented 4 months ago

issue #30 is related to issue #31, can you please assign this to me as well?

bala-ceg commented 4 months ago

@marcosmarxm @aaronsteers can you please let me know which connector development method i should follow - python cdk or lowcode cdk

marcosmarxm commented 4 months ago

@bala-ceg Probably you'll need to use Python CDK as the stream are going to be dynamically created

bala-ceg commented 4 months ago

@marcosmarxm is there any DB based python CDK that is written previously? I would like to see that as reference