apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.88k stars 3.38k forks source link

[C++] Implement ODBC driver "wrapper" using FlightSQL #30622

Open asfimport opened 2 years ago

asfimport commented 2 years ago

The ODBC analogue to ARROW-7744

Reporter: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-15111. Please see the migration documentation for further details.

asfimport commented 2 years ago

David Li / @lidavidm: For reference see https://github.com/dremio/warpdrive which was recently released (GPL2, though)

spencerwilson commented 8 months ago

Possibly related to dremio/warpdrive:

alinaliBQ commented 7 months ago

https://github.com/dremio/flightsql-odbc should have the Apache 2.0 license, my understanding is that it is a data source client that works with Apache Flight SQL. One should be able to develop an ODBC driver using this data source client.

Would cpp/src/arrow/flight/sql be an appropriate subdirectory to put the ODBC driver? James and I are thinking that this looks like the right place, but please let us know if the proposal makes sense. cc @jduo

alinaliBQ commented 7 months ago

Hello @wesm and @lidavidm, would either of you mind taking a look and letting me know if cpp/src/arrow/flight/sql is an appropriate subdirectory to put the ODBC driver? Thank you

lidavidm commented 7 months ago

@alinaliBQ how about cpp/src/flightsql_odbc or similar?

alinaliBQ commented 7 months ago

@lidavidm Sure, cpp/src/flightsql_odbc makes sense to me. Ty. I have another question. We are looking to build a new ODBC driver for Flight SQL that can be part of the Arrow project. It would utilize parts of the Amazon Timestream ODBC driver and the Flight SQL ODBC driver (flightsql-odbc) (written by Dremio), which are Open Source and Apache 2.0-licensed. Are there any questions/concerns regarding using those drivers?

lidavidm commented 7 months ago

Do you mean that you plan to import or copy large chunks of one or both projects, or do you mean that you plan to use them as dependencies? If the former, I think depending on how much is copied we may have to think about IP clearance, but it's not clear to me what the threshold is.

alinaliBQ commented 7 months ago

We're planning the former, so large chunks from both projects will be used. We're in the designing stage of developing the driver, so things may change later; the plan is that flightsql-odbc will be mostly used as-is other than changes to conforming to Arrow coding guidelines, and for Amazon Timestream driver, only its ODBC function entry code will be used and adapted to call into flightsql-odbc classes.

lidavidm commented 7 months ago

Ok. I would encourage you to submit a PR ASAP even if it is not complete so we can do as much development as possible in Apache repos. The guidelines state IP clearance is needed when most of the development is done outside of Apache repos, so submitting a single large PR (as with the JDBC driver) might mean we want to do an IP clearance to be safe; submitting PRs here early and often would help avoid that.

At least, flightsql-odbc is probably enough code that we might want to do IP clearance anyways...however, please discuss this on dev@ so others can chime in

alinaliBQ commented 7 months ago

We were thinking of submitting the PR early as well. Our initial plan is to submit the PR when the driver is able to connect the Flight SQL ODBC driver, with irrelevant code pruned. James has let me know that the community has indicated that the PR for Timestream ODBC + flightsql-odbc can be sent even if the driver doesn't compile since it's for starting the IP scanning process, so we can go with that. And I have written an email dev@ for other's opinions on this matter at: https://lists.apache.org/thread/t1r3pntpzoxdncgoj5f581hxyyl19bkl.

laurentgo commented 6 months ago

Possibly related to dremio/warpdrive:

* https://docs.dremio.com/current/sonar/client-applications/drivers/arrow-flight-sql-odbc-driver

  * the linked [driver download page](https://www.dremio.com/drivers/odbc/) indicates that its license is some version of LGPL, but I can't find a link to its source code

The driver is a combination of ASL 2.0 and LGPL. The LGPL license is available at https://github.com/dremio/warpdrive/blob/master/license.txt

alinaliBQ commented 6 months ago

The driver is a combination of ASL 2.0 and LGPL. The LGPL license is available at https://github.com/dremio/warpdrive/blob/master/license.txt

Thank you @laurentgo for explaining.

Just for clarification, we'll not be using any LGPL code from the warpdrive (https://github.com/dremio/warpdrive) for the driver development. We are planning to make a Flight SQL ODBC driver that is fully ASL 2.0 so we can contribute back to the Apache community.

alinaliBQ commented 6 months ago

Hi @lidavidm, currently our team's implementation is being done inside our own arrow fork. I was wondering if you know any Arrow community members who would be interested in taking a look at our incremental PRs for the ODBC driver? If so, we could assign folks as the reviewers to let them know which PRs to take a look at. We would appreciate additional pairs of eyes from the community. Please let me know if you have any questions.

kou commented 6 months ago

Could you open a PR to apache/arrow instead of your fork?

wesm commented 6 months ago

We could create a branch on apache/arrow so that PRs do not have to go into the main branch (in case things are unstable)?

kou commented 6 months ago

It seems that you already have many changes. Can we break down it to small pieces and proceed step-by-step like we did to implement Google Cloud Storage file system and Azure Blob Storage file system?

kou commented 6 months ago

We could create a branch on apache/arrow so that PRs do not have to go into the main branch (in case things are unstable)?

We can do it but we may not need to do it. Because we will have a build option for this module such as ARROW_FLIGHT_SQL_ODBC and it's OFF by default. If this module isn't built by default, we don't need to care about stability.

We require IP clearance for this, right? We can't merge the first PR to apache/arrow before it's completed. I think that we should avoid developing outside of apache/arrow as much as possible. So I think that we should focus on the IP clearance instead of developing for now.

lidavidm commented 6 months ago

I agree with Kou. We can consider a branch if needed but since this should be reasonably fenced off from the rest of the codebase, it should be OK to just have it on main.

@alinaliBQ for the original question please tag me to start with and we can pull in more people as needed.

devozerov commented 5 months ago

Our team has some experience working with Dremio's ODBC driver when connecting to a custom Arrow Flight endpoint (a Trino fork with Arrow Flight SQL support). We were also considering taking Dremio's Apache 2.0 code as a base and creating a fully-fledged driver. I am very happy that some work already being done in Arrow's community. If needed, we can help with the review.