apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
381 stars 94 forks source link

Implementing drivers in python #2292

Open tokoko opened 1 week ago

tokoko commented 1 week ago

What would you like help with?

I suppose this is already possible by duck typing classes to look like the ones in adbc_driver_manager, but I'm curious what's the general attitude towards implementing new drivers in python. A couple of valid use cases that come to mind are:

I'm wondering if it might be a good idea to add a dummy python driver implementation to encourage such use cases.

lidavidm commented 1 week ago

Python wrappers is fine; SQlite already adds a couple of extra methods IIRC.

I'm not sure implementing drivers in Python makes any sense. At that point what you're actually doing is just implementing DB-API, no?

paleolimbot commented 1 week ago

For what it's worth I think it's something that is perfectly valid to enable (although there is a long list of things ahead of it for me personally). Kirill and I chatted briefly about this in R since it would enable existing DBI drivers to more easily implement an ADBC-native interface (allowing us to migrate end-user usage to ADBC). In R we are perhaps more actively trying to move on from DBI than Python users are trying to move on from dbapi.

The ability to instantly prototype a driver and test it shouldn't be undersold, either (although we could make a project with the boilerplate in Go, C++, and Rust with a few Python tests that might accomplish something similar).

tokoko commented 1 week ago

I'm not sure implementing drivers in Python makes any sense. At that point what you're actually doing is just implementing DB-API, no?

sure, I guess that is what I mean, but to be fair it's not just DB-API, right? It's a heavily adbc-flavored DB-API at best. Most of the features why people would look this way is adbc/arrow specific: fetch_arrow, get_objects, partitions, substrait.

The ability to instantly prototype a driver and test it shouldn't be undersold, either (although we could make a project with the boilerplate in Go, C++, and Rust with a few Python tests that might accomplish something similar).

I know this might not be the best comparison, but I'm sort of thinking of python drivers as analogous to the newly added Python DataSource API in pyspark. You could argue that prototyping in java/scala can be just as easy, but it's all about familiarity at the end of the day, right? For pyspark users, python API probably means less hurdles for a prototype. To extend the example to this discussion, if some python system/library is directly using adbc (meaning DB-API with adbc extensions) as a pluggable source, it might be easier to implement some unusual cases directly in python, most likely in the same codebase w/o any additional build steps.

lidavidm commented 2 days ago

I suppose anyone is free to duck-type themselves as an ADBC driver, I'm mostly just reluctant to expand the scope to include a formal Python API specification. But maybe we should try to intentionally compete with DB-API and/or formalize some of the extensions that we (and others) make to the API.