Making minimal sized `sdist` and `whl` using pyproject.toml

SamuelLarkin commented 1 week ago

PR Goal?

Turns out that our understanding of pyproject.toml with hatch was insufficient. This PR aims to reduce what is included in a sdist and whl. We decided to remove all tests/ because they would require us to include the wav files and more which bloats the whl and sdist.

Fixes?

fixes: #586

Feedback sought?

This PR as a long and explicit list of inclusions and exclusions, is this the best way of doing what we want to achieve?

Priority?

Needed before our next release

Tests added?

None

How to test?

Get yourself some reference prebuild whl and sdist files.

wget 'https://files.pythonhosted.org/packages/1c/c4/99440963fb2562fcf019d8949400c7467a6c10a524515e81f054cbbcb32a/everyvoice-0.1.0a3-py3-none-any.whl'
wget 'https://files.pythonhosted.org/packages/14/a4/898c980f6ee34ce240efb9f6d3ffbf24ed755497dc52751953ca5a2228e4/everyvoice-0.1.0a3.tar.gz'

hatch build

Compare the content of each files with its reference.

sdist

vimdiff \
  +':windo %sort' \
  <(tar tf dist/everyvoice-0.2.0a0.tar.gz | sed 's|^[^/]\+/||') \
  <(tar tf everyvoice-0.1.0a3.tar.gz | sed 's|^[^/]\+/||;  /.*\/$/d')

diff \
  <(tar tf everyvoice-0.1.0a3.tar.gz | sed 's|^[^/]\+/||;  /.*\/$/d' | sort) \
  <(tar tf dist/everyvoice-0.2.0a0.tar.gz | sed 's|^[^/]\+/||' | sort)

1d0
<
19,25c18
< everyvoice.egg-info/dependency_links.txt
< everyvoice.egg-info/entry_points.txt
< everyvoice.egg-info/not-zip-safe
< everyvoice.egg-info/PKG-INFO
< everyvoice.egg-info/requires.txt
< everyvoice.egg-info/SOURCES.txt
< everyvoice.egg-info/top_level.txt
---
> everyvoice/evaluation.py
36a30
> everyvoice/model/aligner/DeepForcedAligner/.gitignore
42a37
> everyvoice/model/aligner/wav2vec2aligner/.gitignore
54d48
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/audit.py
56d49
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/check_data.py
71,73d63
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/__init__.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_writing_callbacks.py
78a69
> everyvoice/model/feature_prediction/FastSpeech2_lightning/.gitignore
82a74
> everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning/.gitignore
95,106d86
< everyvoice/tests/basic_test_case.py
< everyvoice/tests/__init__.py
< everyvoice/tests/preprocessed_audio_fixture.py
< everyvoice/tests/stubs.py
< everyvoice/tests/test_cli.py
< everyvoice/tests/test_configs.py
< everyvoice/tests/test_dataloader.py
< everyvoice/tests/test_model.py
< everyvoice/tests/test_preprocessing.py
< everyvoice/tests/test_text.py
< everyvoice/tests/test_utils.py
< everyvoice/tests/test_wizard.py
121a102
> everyvoice/wizard/simple_term_menu_win_stub.py
123a105
> .gitignore
125d106
< MANIFEST.in
126a108
> pyproject.toml
128,133d109
< requirements.dev.txt
< requirements.test.txt
< requirements.torch.txt
< requirements.txt
< setup.cfg
< setup.py

Wheel

vimdiff \
  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||') \
  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||')

diff \
  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||' | sort) \
  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||' | sort)

diff  <(unzip -l everyvoice-0.1.0a3-py3-none-any.whl | sed 's|.* ||' | sort)  <(unzip -l dist/everyvoice-0.2.0a0-py3-none-any.whl | sed 's|.* ||' | sort)                                    3,9c3,8
< everyvoice-0.1.0a3.dist-info/entry_points.txt
< everyvoice-0.1.0a3.dist-info/LICENSE
< everyvoice-0.1.0a3.dist-info/METADATA
< everyvoice-0.1.0a3.dist-info/RECORD
< everyvoice-0.1.0a3.dist-info/top_level.txt
< everyvoice-0.1.0a3.dist-info/WHEEL
< everyvoice-0.1.0a3-py3-none-any.whl
---
> dist/everyvoice-0.2.0a0-py3-none-any.whl
> everyvoice-0.2.0a0.dist-info/entry_points.txt
> everyvoice-0.2.0a0.dist-info/licenses/LICENSE
> everyvoice-0.2.0a0.dist-info/METADATA
> everyvoice-0.2.0a0.dist-info/RECORD
> everyvoice-0.2.0a0.dist-info/WHEEL
26a26
> everyvoice/evaluation.py
55d54
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/audit.py
57d55
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/cli/check_data.py
72,74d69
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/__init__.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py
< everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_writing_callbacks.py
96,107d90
< everyvoice/tests/basic_test_case.py
< everyvoice/tests/__init__.py
< everyvoice/tests/preprocessed_audio_fixture.py
< everyvoice/tests/stubs.py
< everyvoice/tests/test_cli.py
< everyvoice/tests/test_configs.py
< everyvoice/tests/test_dataloader.py
< everyvoice/tests/test_model.py
< everyvoice/tests/test_preprocessing.py
< everyvoice/tests/test_text.py
< everyvoice/tests/test_utils.py
< everyvoice/tests/test_wizard.py
122a106
> everyvoice/wizard/simple_term_menu_win_stub.py

Access to the unittest

After making a new environment with make-everyvoice-env, we can still perform python -m unittest everyvoice.tests.test_dataloader even if the tests/ directory wasn't included in the whl.

Confidence?

fair

Version change?

no

Related PRs?

None

semanticdiff-com[bot] commented 1 week ago

Review changes with

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 76.58%. Comparing base (dbc2282) to head (aca24a3). Report is 9 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #587 +/- ## ======================================= Coverage 76.58% 76.58% ======================================= Files 46 46 Lines 3451 3451 Branches 470 470 ======================================= Hits 2643 2643 Misses 706 706 Partials 102 102 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 1 week ago

CLI load time: 0:00.31
Pull Request HEAD: aca24a35bddeb49a8c542faf85e69fd3bb501526
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:       262 |     106944 |   typer
import time:      6358 |     225062 | everyvoice.cli

joanise commented 1 week ago

This PR as a long and explicit list of inclusions and exclusions, is this the best way of doing what we want to achieve?

For ReadAlongs/Studio, the equivalent code (scroll to pyproject.toml lines 94-97) is much shorter, so I was surprised you had to have all these exclusions here, but EV has all these files deeper, so we probably can't remove this list.

You can shorten it a tiny bit by finding patterns that express the sames things in a shorter way, but that's of little value,

One detail: include: "requirements.torch.txt" is negated by exclude: "requirement*txt", but I don't see any value in including this file in the wheel or tar ball, we only use when installing from the sandbox.

TLDR: the changes I suggest are:

factor out the two lists into tool.hatch.build since they can be made identical easily enough,
remove "requirements.torch.txt" from include.

EveryVoiceTTS / EveryVoice