MitchellAcoustics / Soundscapy

A python library for soundscape assessments
http://soundscapy.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
45 stars 10 forks source link

Reduce installation size #99

Open MitchellAcoustics opened 20 hours ago

MitchellAcoustics commented 20 hours ago

Examining the installation size of soundscapy, we end up quite big for comparable packages. We should try to reduce this where possible, either by removing unnecessary dependencies or making use of optional dependencies.

For instance:

The largest dependencies in soundscapy[audio] are:

Obviously some are unavoidable - SciPy, pandas, matplotlib, seaborn, etc. And just soundscapy isn't too bad. The issue is with some of the dependencies, especially plotly, llvmlite, skimage.

Plotly should be straightforward to now make an optional dependency - ship seaborn as standard, and if users would like to use the plotly backend, they can install it.

llvmlite is only required by pandas[performance] and I was planning to take the performance tag out anyway, so that's easy.

scikit-maad is the source of numba (via resampy) and scikit-image. I don't want to make optional dependencies as granular as separating psychoacoustics, ecoacoustics, etc. So unless scikit-maad drops these, then there's not much we can do. But at the least, without the [audio] optionals, these shouldn't be necessary.

uv tree
sspy-install v0.1.0
└── soundscapy[audio] v0.7.5
    ├── loguru v0.7.2
    ├── pandas[excel, performance] v2.2.3
    │   ├── numpy v2.0.2
    │   ├── python-dateutil v2.9.0.post0
    │   │   └── six v1.16.0
    │   ├── pytz v2024.2
    │   ├── tzdata v2024.2
    │   ├── odfpy v1.4.1 (extra: excel)
    │   │   └── defusedxml v0.7.1
    │   ├── openpyxl v3.1.5 (extra: excel)
    │   │   └── et-xmlfile v2.0.0
    │   ├── python-calamine v0.3.1 (extra: excel)
    │   │   └── packaging v24.1
    │   ├── pyxlsb v1.0.10 (extra: excel)
    │   ├── xlrd v2.0.1 (extra: excel)
    │   ├── xlsxwriter v3.2.0 (extra: excel)
    │   ├── bottleneck v1.4.2 (extra: performance)
    │   │   └── numpy v2.0.2
    │   ├── numba v0.60.0 (extra: performance)
    │   │   ├── llvmlite v0.43.0
    │   │   └── numpy v2.0.2
    │   └── numexpr v2.10.1 (extra: performance)
    │       └── numpy v2.0.2
    ├── plotly v5.24.1
    │   ├── packaging v24.1
    │   └── tenacity v9.0.0
    ├── pydantic v2.9.2
    │   ├── annotated-types v0.7.0
    │   ├── pydantic-core v2.23.4
    │   │   └── typing-extensions v4.12.2
    │   └── typing-extensions v4.12.2
    ├── pyyaml v6.0.2
    ├── schema v0.7.7
    ├── scipy v1.14.1
    │   └── numpy v2.0.2
    ├── seaborn v0.13.2
    │   ├── matplotlib v3.9.2
    │   │   ├── contourpy v1.3.0
    │   │   │   └── numpy v2.0.2
    │   │   ├── cycler v0.12.1
    │   │   ├── fonttools v4.54.1
    │   │   ├── kiwisolver v1.4.7
    │   │   ├── numpy v2.0.2
    │   │   ├── packaging v24.1
    │   │   ├── pillow v11.0.0
    │   │   ├── pyparsing v3.2.0
    │   │   └── python-dateutil v2.9.0.post0 (*)
    │   ├── numpy v2.0.2
    │   └── pandas v2.2.3 (*)
    ├── acoustics v0.2.6 (extra: audio)
    │   ├── matplotlib v3.9.2 (*)
    │   ├── numpy v2.0.2
    │   ├── pandas v2.2.3 (*)
    │   ├── scipy v1.14.1 (*)
    │   ├── six v1.16.0
    │   └── tabulate v0.9.0
    ├── mosqito v1.2.1 (extra: audio)
    │   ├── numpy v2.0.2
    │   ├── pyuff v2.4.3
    │   │   └── numpy v2.0.2
    │   └── scipy v1.14.1 (*)
    ├── scikit-maad v1.4.3 (extra: audio)
    │   ├── matplotlib v3.9.2 (*)
    │   ├── numpy v2.0.2
    │   ├── pandas v2.2.3 (*)
    │   ├── resampy v0.4.3
    │   │   ├── numba v0.60.0 (*)
    │   │   └── numpy v2.0.2
    │   ├── scikit-image v0.24.0
    │   │   ├── imageio v2.36.0
    │   │   │   ├── numpy v2.0.2
    │   │   │   └── pillow v11.0.0
    │   │   ├── lazy-loader v0.4
    │   │   │   └── packaging v24.1
    │   │   ├── networkx v3.4.2
    │   │   ├── numpy v2.0.2
    │   │   ├── packaging v24.1
    │   │   ├── pillow v11.0.0
    │   │   ├── scipy v1.14.1 (*)
    │   │   └── tifffile v2024.9.20
    │   │       └── numpy v2.0.2
    │   └── scipy v1.14.1 (*)
    └── tqdm v4.66.6 (extra: audio)
(*) Package tree already displayed

From testing, it looks like just removing pandas [performance] would reduce us to ~300M for the core install. Removing plotly goes down to ~200M. This still feels quite big though...

Removing SciPy would reduce further to ~123M. It's not required by other core dependencies, although it is for acoustics, mosqito, and scikit-maad. It doesn't look feasible to remove it from the code though - we use it for SSM curve fit optimization and for stats.kurtosis. The stats could probably be done with a smaller dependency, but optimise would be difficult to change.

MitchellAcoustics commented 19 hours ago

Note: running sudo du -hs * | sort -h -r in the .venv site-packages gives the size for all the packages.

MitchellAcoustics commented 12 hours ago

101 removed pandas [performance] and reduced size to 300M. Heaviest dependency is plotly.