Open gs11 opened 4 years ago
Sounds reasonable. Would you like to make this change?
Sure! I can do that
Started looking into it and realized my experience with distributing python packages is virtually non-existent. When building dists locally I can't really reproduce the issue with the test folder containing the large blobs (e.g. bitbpack, rle etc).
It actually seems not that easy to do if we want to maintain a fastparquet[tests]
install. I guess we could instead tell people, that to test, they need to clone the repo.
While not a heavy fastparquet user myself I'd say that'd be a fair tradeoff.
One of the points in favour of fastparquet over pyarrow has been the install size, so maybe it's worth someone's time to do this (would involve messing with MANIFEST, I believe). I don't imagine getting to it soon, though.
About size I found that in addition to the fastparquet package itself being larger, the total sum of the fastparquet dependencies were substantially larger than those of pyarrow.
Either way, I can't seem to replicate creating a distribution that has the same contents as that on pypi. What does the release process look like today?
Releases are packaged using python setup.py sdist bdist_wheel
, so what gets included depends on the contents of setup and MANIFEST.in
The release I generate with the above command is less than half a Mb - excluding those test blobs. There's no way i can create the dist including those binary files.
I can exclude the test
folder altogether in MANIFEST.in
but I'm not sure that'll have any effect as the release might be generated differently?
When trying to optimize the speed for a serverless/lambda deployment I found that the fastparquet wheel contains a test folder of ~80 Mb.
Could this be excluded from the distribution as I presume it's not needed for that?