chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/docs/en/chdb
Apache License 2.0
2.03k stars 72 forks source link

Pandas dataframe output introduced pyarrow and pandas dependency #11

Closed auxten closed 1 year ago

auxten commented 1 year ago

Pandas dataframe output introduced pyarrow and pandas dependency #6 will be released at v0.5.0 I'm thinking about whether it's worth it, after all, the size of pyarrow and pandas is not small. Initiate a vote here, think:

  1. It is worthwhile to add pyarrow and pandas dependencies to support dataframe, please click 👍
  2. If you think it is not worth it, please click 👎
auxten commented 1 year ago

There is a possible implementation for discussion: don't add pyarrow or pandas to chdb's install_requires. Only when related functions are used, for example, when using DataFrame format as output, test whether there are pyarrow and pandas versions that meet the requirements. If not, an error will be reported and the user will be prompted to install them. I don't know if there is a precedent for this practice in the python library. We can discuss it here. @nmreadelf @lmangani

lmangani commented 1 year ago

error will be reported and the user will be prompted to install them.

This sounds like the best route, personally speaking. Clear error and Manual installation for those who want/need them. This opinion is mostly due to the uncompressed library being already 500+ Mb uncompressed (400 when stripped) so size does matter, for now at least.