andrewtavis / causeinfer

Machine learning based causal inference/uplift in Python
BSD 3-Clause "New" or "Revised" License
56 stars 11 forks source link

Your wheel is a mess #8

Closed jwodder closed 3 years ago

jwodder commented 3 years ago

The v0.1.1.2 wheel of this project released on PyPI yesterday contains numerous copies of temporary files that shouldn't be a in a wheel; you can see the wheel's file listing here or by running zipinfo or a similar program on the .whl file. I believe that this happened to due to the use of find_namespace_packages(); because your setup.py calls this function without an exclude or include argument, it picks up the directories containing .py files in the temporary build/ directory, and so each build of the wheel ends up including build/ while also adding to build/, leading to the problem at hand.

My recommendation for fixing this is:

  1. Delete the build/ directory from your local repository, perhaps by running git clean -dXf.
  2. As your Python packages all have __init__.py files, there is no reason to use find_namespace_packages() over find_packages(), so I would recommend changing what function you use in setup.py. If you are set on using find_namespace_packages(), you should add include=["causeinfer", "causeinfer.*] to the function's arguments in order to not capture anything outside your code directory. (You should also set include or exclude even if you do use find_packages() in order to exclude your tests/ directory from the wheel; see here for more information.)
andrewtavis commented 3 years ago

Hi John. Thank you for sending this along and for the suggestions. I'll get to this later today or tomorrow, and will reference the blogpost in your second link. I'll further fix this in the other packages. Very much appreciate having this pointed out!

andrewtavis commented 3 years ago

@jwodder, this took an extra bit because I wanted to rearrange the package to an src structure as suggested by you and the others linked in the blogpost (was reading up on that and figuring out testing). I ran check-wheel-contents on the v0.1.1.3 wheel and got the following:

dist/causeinfer-0.1.1.3-py3-none-any.whl: OK

I'll check wheelodex later for an update on this as a final check. Please let me know if there's anything else I'm missing here, and again thank you for taking the time to bring this to my attention!

jwodder commented 3 years ago

@andrewtavis While your wheel no longer contains any superfluous files, I think it may be missing some files. Specifically, I assume you want to include src/causeinfer/data/datasets/* in your wheel, correct? There are two different ways to configure setuptools to include such package data in your wheels, described here, and check-wheel-contents can be told to check that everything in a local directory ended up in your wheel with the --package or --src-dir option.

andrewtavis commented 3 years ago

@jwodder, yes, most definitely those should be included. Thank you further for the secondary check on all this! Will read the linked blogpost and get to this.

andrewtavis commented 3 years ago

@jwodder, as the PR says, issue was that I didn't have include_package_data=True in my setup.py to match the arguments in the new MANIFEST.in. Most recent run of check-wheel-contents dist/causeinfer-0.1.1.4-py3-none-any.whl --src-dir=src gave the following:

dist/causeinfer-0.1.1.4-py3-none-any.whl: OK

A similar run on causeinfer-0.1.1.3 generated a list of all the data files. I'll again check wheelodex when it's updated as a final check.

This is the only package of mine that has necessary data that should go in the sdist, and I've updated all the others with an src structure and the needed setup.py alterations except for kwx (will be a part of the next PR that has other fixes).

Let me know if there are any other issues, and thanks for such a useful package regarding check-wheel-contents!