Closed janhicken closed 3 weeks ago
@simonharrer I think it's likely worth splitting a lot of these packages into optional imports aka extras?
I think this is a fair point now, and we should add extras.
I think this is a fair point now, and we should add extras.
Drafting out the changes here #234
@jochenchrist follow-up on this. Moving deltalake into an extra cut out 200MB of those larger deps I mentioned in the previous PR. I posted a PR here ~#240~ #242
(I remade the PR because rebasing is hard haha)
I think 1.5Gb -> 300mb ought to be enough to close this issue out
I'd agree, with 1/5 of the dependency size, the library is much more efficient now :)
When adding the
datacontract-cli
package as a dependency to a Python project, a lot of transitive dependencies get added. After adding the dependency, my application's Docker image grew from 330 MiB to 1.2 GiB in size.My application only uses SodaCL in conjunction with a PostgreSQL database, however other frameworks like
pyspark
(340 MB),pyarrow
(123 MB) anddeltalake
(75 MB) are integrated as well.Would it be possible to split the packages per target technology like Soda does it? Instead, maybe Extras can be used for this as well.