apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.02k stars 1.14k forks source link

Are different database systems going to be supported data sources? #1048

Open Smurphy000 opened 3 years ago

Smurphy000 commented 3 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. Are there plans to add a way to support connect/reading from a database (Postgres, MySQL, etc)?

Describe the solution you'd like I am currently thinking in terms of Spark, where I have the ability to specify a format where that format could also be a package like com.microsoft.sqlserver.jdbc.spark which connects to the database via jdbc

If this is already supported in some way, where can I find this?

Igosuki commented 3 years ago

Hi, disclaimer not a maintainer. No it's not currently supported. I definitely see a use for it to join database tables with columnar data, although I find it is subject to the multiple-writer database anti-pattern, so you must have strong test suites to make sure changing the db (which is often owned by an API) won't affect the ETL process. One would have to create an experimental crate, implement a generic TableProvider for RDBMs systems, validate the SQL statements against the db, and adapt arrow to whatever data structure is used for sql statements. Interesting projects that come to mind for this are tokio-rs/rdbc and sqlx, there is also the arrow jdbc adapter and ongoing tasks like https://issues.apache.org/jira/browse/ARROW-7744

jorgecarleitao commented 3 years ago

fyi, https://github.com/sfu-db/connector-x already supports arrow, so it is "only" a matter of gluing it together.

Igosuki commented 3 years ago

@jorgecarleitao didn't know about this, so just need to implement a table provider and add the sources !

houqp commented 3 years ago

Yes, I would recommend implementing a table provider plugin using connector-x as a self-contained crate :)