EducationalTestingService / skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
http://skll.readthedocs.org
Other
550 stars 69 forks source link

Replace MANIFEST.in with `package_data` in setup.py #679

Closed RobertImbrie closed 3 years ago

RobertImbrie commented 3 years ago

This PR closes #674.

Note: Tested by running python setup.py sdist build once for each setup and running a diff on the generated folders. They were identical!

desilinguist commented 3 years ago

Thanks for your continuing contributions @RobertImbrie! They are greatly appreciated.

I am also pleasantly surprised to see that Gitlab CI was triggered for this external PR when it wasn't for the last one. 🤔

RobertImbrie commented 3 years ago

Thank you, Nitin! I'm hoping to keep them going. :)

Also, glad to hear it went through easily this time!

desilinguist commented 3 years ago

So, I have been playing around with this and I don't think these changes are giving us the desired result. Here are the things I tried:

  1. If I remove MANIFEST.in and run python setup.py sdist build, tests and examples are correctly excluded but LICENSE.TXT and requirements.txt are not included.

  2. If I modify the find_packages() call in setup.py to not mention exclude=['tests', 'examples'] and run python setup.py sdist build, tests and examples are not excluded this time around and LICENSE.txt and requirements.txt are still not included.

  3. If I modify setup.py to now add back exclude=['tests', 'examples'] and then also add the following two lines to it, tests and example are now back to being correctly excluded but LICENSE.txt and requirements.txt are still not included:

    package_data={'': ['*.rst', 'LICENSE.txt', 'requirements.txt']},
    include_package_data=True,

So, I referred to the relevant documentation for setuptools and found this parenthetical nugget:

Note: although the package_data argument was previously only available in setuptools, it was also added to the Python distutils package as of Python 2.4; there is some documentation for the feature available on the python.org website. If using the setuptools-specific include_package_data argument, files specified by package_data will not be automatically added to the manifest unless they are listed in the MANIFEST.in file.

So, what I tested next was to keep MANIFEST.in but to edit it to only keep the following lines:

include LICENSE.txt
include requirements.txt

And this yielded exactly the same results as what we have right now (README.rst is already included so we don't need the include *.rst line either)!

@RobertImbrie can you please edit this PR to make the following changes:

NOTE: If you are going to try these experiments, make sure to delete the skll.egg-info and build directories before every invocation of python setup.py sdist build because as the documentation says:

When building an sdist, the datafiles are also drawn from the package_name.egg-info/SOURCES.txt file, so make sure that this is removed if the setup.py package_data list is updated before calling setup.py.

RobertImbrie commented 3 years ago

Hi Nitin! I made the requested changes to the pull request.

Also, my sincere apologies for the issues! I had deleted the build directory but not skll.egg-info, which should account for why my output was different. Next time, I will read the documentation more closely.

desilinguist commented 3 years ago

Looks great! Thanks @RobertImbrie!