databricks / databricks-sql-python

Databricks SQL Connector for Python
Apache License 2.0
153 stars 87 forks source link

Remove requirement: openpyxl #316

Open davebelais opened 8 months ago

davebelais commented 8 months ago

You include openpyxl as a requirement for this package, however openpyxl is not used by this library, as you can see from this search. Please remove this requirement to reduce bloat in applications/libraries dependent on this package. Thanks!

susodapop commented 8 months ago

This is a good catch. openpyxl is the seventh largest dependency of databricks-sql-connector weighing in at 1.98mb. And we have a big effort underway to reduce the overall installation size. Pull requests will be incoming for this in the next week or so.

openpyxl isn't used by the connector but it is used as part of our e2e test suite. The solution is to simply move it in pyproject.toml so that it's only installed in development mode.

MichaelAnckaert commented 6 months ago

+1 for this idea. The size of the total install size is very large IMO.

FYI: I'm currently trying to work around the issue where adding databricks-sql-python to a lambda function causes the function size to balloon over the 250Mb limit.

joeraver commented 4 months ago

+1 Same issue

susodapop commented 4 months ago

@MichaelAnckaert the biggest culprits for install size are pyarrow and numpy. Remove openpyxl makes sense towards the same goal but comprises a small fraction of the total install size.

henryhueske commented 2 months ago

+1 same issue