Open asfimport opened 5 years ago
Chen / @dota17: is it necessary to do it,?Arrow is designed for in-memory data analysis, the data won't be modified.
recently i pay much attention on Arrow, i want to make a contribution to the project.
if this issue is necessary, i am happy to resolve it.
Michael Pigott / @mikepigott: Hi, and thanks for taking a look at this! I saw Arrow as a tool that could be used to transfer data between systems, and not just for in-memory analysis. However, I don't need this today, and if nobody else has asked for it, then there might be more valuable JIRA tickets than mine for you to dedicate your time to?
Thanks again! Mike
Micah Kornfield / @emkornfield: @xhochy have you come across a use-case for writing to JDBC sources?
Uwe Korn / @xhochy:
Yes, the usecase would be to write large pandas.DataFrames
to a database layer that only has performant JDBC drivers. Personally, all my JDBC sources are read-only and thus I didn't write a WriteToJDBC function but other people will also use these technologies with more access rights. I have used the "pyarrow->Arrow Java -> JDBC" successfully with Apache Drill and Denodo. I also heard that some people use it together with Amazon Athena and here a performant INSERT might be interesting https://docs.aws.amazon.com/athena/latest/ug/insert-into.html as the JDBC driver seems to be the most performant currently.
Gleb: Our team is also in need for this feature.
Currently we are implementing cross-language data processing inside k8s, the DB component is responsible for pushing/pulling DB data. Arrow is chosen as a data format due to efficiency and gRPC support from the box. The only thing that we lack when working with JDBC sources - is the ability to convert data back to JDBC from Arrow.
Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.
I am looking for a similar feature and came across this issue. I know we have supported JDBC to Arrow but not in the inverse direction. Do we have any plan to support it? @lidavidm
https://github.com/apache/arrow-adbc/tree/main/java/driver/jdbc can do it but needs assistance generating the SQL
https://github.com/apache/arrow/blob/main/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcParameterBinder.java requires you to set up the PreparedStatement and everything and solely automates the work of converting Arrow data to JDBC statement parameters
If one of those works, I think we should document the possibilities and close this issue
https://github.com/apache/arrow-adbc/tree/main/java/driver/jdbc can do it but needs assistance generating the SQL
https://github.com/apache/arrow/blob/main/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcParameterBinder.java requires you to set up the PreparedStatement and everything and solely automates the work of converting Arrow data to JDBC statement parameters
If one of those works, I think we should document the possibilities and close this issue
None of them work for my case. My goal is to wrap a set of VectorSchemaRoot
into JDBC ResultSet
in the read path.
Ah. Interesting. Sorry, I misunderstood.
There is stuff like that in the JDBC driver but not exposed for public use. What is the use case?
To provide standard JDBC access to a database or engine that speaks arrow vector natively.
Ok, cool. (So this is like an embeddable engine? I assume otherwise you could use Flight SQL.)
Your best bet is to copy or refactor the code that is part of the JDBC driver. There isn't a separate adapter for this but it would make sense to refactor things into arrow-jdbc if they are useful more widely.
This is also different from this original issue, since this issue was about writing Arrow data to a database via JDBC
Yes, code like that. (I suppose the Avatica dependency means we may not want to add it to arrow-jdbc, but it's already public - does it sort of work? We can talk about refactoring it if it meets your needs.)
Yes, it seems like what I am searching for. I need some time to read into it before a concrete conclusion. Thanks for your help!
@lidavidm Can we add a separate flight-sql-jdbc-core
module and move the common code here?
The issue is that flight-sql-jdbc-driver
has relocated a lot of classes including VectorSchemaRoot
: https://github.com/apache/arrow/blob/main/java/flight/flight-sql-jdbc-driver/pom.xml#L193. This makes it difficult to directly develop other JDBC drivers based on it.
If the ResultSet implementation (and any other functionality?) can be cleanly separated out and is useful more broadly, then that sounds reasonable to me.
I will try it once I have some time.
ARROW-1780 reads a query from a JDBC data source and converts the ResultSet to an Arrow VectorSchemaRoot. However, there is no built-in adapter for writing an Arrow VectorSchemaRoot back to the database.
ARROW-3966 adds JDBC field metadata:
The Field Type
We can use this information to ask for the field information from the database via the DatabaseMetaData object. We can then create INSERT or UPDATE statements based on the list of primary keys in the table:
If the value in the VectorSchemaRoot corresponding to the primary key is not NULL, update the existing record in the database.
We can also perform the same data conversion in reverse based on the field types queried from the database.
Reporter: Michael Pigott / @mikepigott
Note: This issue was originally created as ARROW-4144. Please see the migration documentation for further details.