eakmanrq / sqlframe

Turning PySpark Into a Universal DataFrame API
https://sqlframe.readthedocs.io/en/stable/
MIT License
270 stars 8 forks source link

Is the Pandas dependency necessary for all engines? #12

Closed datajoely closed 3 months ago

datajoely commented 3 months ago

Super cool work - very exciting project between this and Ibis I'm so excited about the future of composable, flexible systems where syntax is decoupled from execution.

I noticed in your setup.py that pandas is a required dependency for every back-end. Thinking out loud - I get why you need this to match the Spark API 1:1.

It's no a massive footprint these days, I think the risk is more for older brownfield projects that have complicated existing dependencies. Part of the appeal for this project is that existing Spark codebases can migrate to say duckdb without any pain, this may crop up over time.

pip-compile on pandas with no extras returns the following, the footprint doesn't look super risky, but conflicts may crop up:

❯ uv pip compile requirements.txt
Resolved 6 packages in 9ms
# This file was autogenerated by uv via the following command:
#    uv pip compile requirements.txt
numpy==1.26.4
    # via pandas
pandas==2.2.2
python-dateutil==2.9.0.post0
    # via pandas
pytz==2024.1
    # via pandas
six==1.16.0
    # via python-dateutil
tzdata==2024.1
    # via pandas
eakmanrq commented 3 months ago

Yeah only some operations actually require pandas. So I could localize those imports and then tell the user to install pandas to use those functions. Obviously that introduces some friction though. So it is a tradeoff here between the gain of minimizing the required dependencies and lowering friction for users.

datajoely commented 3 months ago

Yeah it's not something that you need to worry about today, but it would be great later down the line. Thanks again for the fantastic work I'm already 90% of the way done building a Kedro dataset for it sqlframe 💪

datajoely commented 3 months ago

In case you're interested here is my Kedro implementation https://github.com/kedro-org/kedro-plugins/pull/694

eakmanrq commented 3 months ago

Very cool thanks for sharing! 🙌