airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.82k stars 4.06k forks source link

NiFi airbyte integration #3963

Closed dmdmishra closed 2 years ago

dmdmishra commented 3 years ago

Is it possible to implement integration with Apache NiFi to call airbyte jobs just like you have connector airflow?

jrhizor commented 3 years ago

Are you trying to run a sync on Airbyte that you're only launching from NiFi when some condition is met? That should be possible today using our API although we don't have something like a custom processor for this.

How were you hoping to trigger a sync from Nifi? What NiFi sources/destinations would you be using and how would you want to connect those to Airbyte sources/destinations?

dmdmishra commented 3 years ago

Hi, I was looking for a custom processor or an airbyte API which can trigger the airbyte just b Andy n completion return me response based upon n which I can do validations in target using my NiFi flow.

A custom airbyte processor would be useful.

dmdmishra commented 3 years ago

Any thoughts? Is this possible to deliver? Or can it be considered?

jrhizor commented 3 years ago

It sounds like it could be possible to do, but I think it's unlikely we'll prioritize work on this internally until we see more people asking for it. Also none of us on the Airbyte team have worked with NiFi before afaik.

If a community member familiar with NiFi wanted to work on this we could definitely help them out and point them in the right direction.

dmdmishra commented 3 years ago

I am not an developer but would like to hop in and try to do it as I understand NiFi very well. If you tell me how can I help let me know..

jrhizor commented 3 years ago

I imagine this would be a fairly complex task requiring knowledge of nifi internals / creating custom processors + how Airbyte works under the hood (especially the configuration model for Airbyte sources and the protocol for passing messages to/from the source via STDIN/STDOUT).

At the end of the day, I would expect this to run by shelling out in the custom processor to do a docker run with a specific connector image.

https://docs.airbyte.io/understanding-airbyte/airbyte-specification would be a great place to start.

I imagine this is more reasonable if you're only trying to run some Airbyte sources. It'd be much harder with destinations since you'd have to generate catalogs to decide how to consume data.

I'm also not familiar with NiFi, so I don't know how buffering works or how integrating batch-style operations (like airbyte sources) works either.

davinchia commented 2 years ago

We aren't going to support this for now. Please feel free to open if this is a burning question.