CouncilDataProject / speakerbox

Speakerbox: Fine-tune Audio Transformers for speaker identification.
https://councildataproject.org/speakerbox
MIT License
52 stars 6 forks source link

[Enhancement] Number of dependencies of dependencies #12

Closed NicolasMICAUX closed 1 year ago

NicolasMICAUX commented 1 year ago

(First, let's me tell you that i found your library superb! very nice work, thanks for sharing!)

Feature Description

Decrease the number of dependancies of dependancies: while the nb of depandancies in pyproject.toml is quite low, some of your dependencies probably have themselves a lot of (unusual) dependencies, which means that pip will install a lot of dependencies.

Dependencies bring issues with time, compatibility issues etc.

Solution

If it's possible for you, i think it would be a good idea to identify the dependencies that have the most dependencies themselves and try to see if you can bypass their use.


The list of dependencies that were installed with speakerbox:

pandas~=1.0
speechbrain~=0.5.11
pyannote.audio~=2.1
torchaudio~=0.10
matplotlib~=3.5
torch~=1.10
datasets[audio]~=1.18
dataclasses-json~=0.5
scikit-learn~=1.0
transformers~=4.16
pydub~=0.25
librosa~=0.8
marshmallow-enum<2.0.0,>=1.5.1
marshmallow<4.0.0,>=3.3.0
typing-inspect>=0.4.0
pyarrow!=4.0.0,>=3.0.0
multiprocess
aiohttp
requests>=2.19.0
huggingface-hub<1.0.0,>=0.1.0
responses<0.19
numpy>=1.17
dill
xxhash
fsspec[http]>=2021.05.0
packaging
tqdm>=4.62.1
joblib>=0.14
numba>=0.45.1
resampy>=0.2.2
pooch>=1.0
scipy>=1.2.0
soundfile>=0.10.2
audioread>=2.1.9
decorator>=4.0.10
python-dateutil>=2.8.1
pytz>=2020.1
torchmetrics<1.0,>=0.6
singledispatchmethod
backports.cached-property
pyannote.database<5.0,>=4.1.1
pyannote.pipeline<3.0,>=2.3
pytorch-lightning<1.7,>=1.5.4
omegaconf<3.0,>=2.1
asteroid-filterbanks<0.5,>=0.4
torch-audiomentations>=0.11.0
semver<3.0,>=2.10.2
hmmlearn<0.3,>=0.2.7
pytorch-metric-learning<2.0,>=1.0.0
typing-extensions
networkx<3.0,>=2.6
einops<0.4.0,>=0.3
pyannote.metrics<4.0,>=3.2
pyannote.core<5.0,>=4.4
threadpoolctl>=2.0.0
hyperpyyaml
sentencepiece
regex!=2019.12.17
tokenizers!=0.11.3,<0.14,>=0.11.1
filelock
pyyaml>=5.1
frozenlist>=1.1.1
multidict<7.0,>=4.5
attrs>=17.3.0
async-timeout<5.0,>=4.0.0a3
charset-normalizer<3.0,>=2.0
aiosignal>=1.1.2
yarl<2.0,>=1.0
llvmlite<0.40,>=0.39.0dev0
setuptools
antlr4-python3-runtime==4.9.*
pyparsing!=3.0.5,>=2.0.2
appdirs>=1.3.0
simplejson>=3.8.1
sortedcontainers>=2.0.4
typer[all]>=0.2.1
tabulate>=0.7.7
docopt>=0.6.2
sympy>=1.1
optuna>=1.4
six>=1.5
protobuf<=3.20.1
pyDeprecate>=0.3.1
tensorboard>=2.2.0
torchvision
urllib3>=1.25.10
cffi>=1.0
torch-pitch-shift>=1.2.2
julius<0.3,>=0.2.3
mypy-extensions>=0.3.0
ruamel.yaml>=0.17.8
pycparser
alembic>=1.5.0
sqlalchemy>=1.3.0
colorlog
cmaes>=0.8.2
importlib-metadata<5.0.0
cliff
ruamel.yaml.clib>=0.2.6
wheel>=0.26
werkzeug>=1.0.1
tensorboard-plugin-wit>=1.6.0
google-auth<3,>=1.6.3
google-auth-oauthlib<0.5,>=0.4.1
grpcio>=1.24.3
markdown>=2.6.8
absl-py>=0.4
tensorboard-data-server<0.7.0,>=0.6.0
primePy>=1.3
click<9.0.0,>=7.1.1
colorama<0.5.0,>=0.4.3
shellingham<2.0.0,>=1.3.0
rich<13.0.0,>=10.11.0
idna>=2.0
pillow!=8.3.*,>=5.3.0
Mako
pyasn1-modules>=0.2.1
cachetools<6.0,>=2.0.0
rsa<5,>=3.1.4
requests-oauthlib>=0.7.0
pygments<3.0.0,>=2.6.0
commonmark<0.10.0,>=0.9.0
greenlet!=0.4.17
MarkupSafe>=2.1.1
PrettyTable>=0.7.2
stevedore>=2.0.1
autopage>=0.4.0
cmd2>=1.0.0
wcwidth>=0.1.7
pyperclip>=1.6
pyasn1<0.5.0,>=0.4.6
oauthlib>=3.0.0
pbr!=2.1.0,>=2.0.0
evamaxfield commented 1 year ago

Hey @NicolasMICAUX! Thanks for giving this little library a go. I think outside of myself, you are the first person to have done so.

On to the question: I have tried to be really strict with dependencies already. I have thought about it a bit and if anything can be dropped it might be matplotlib since it is only used in eval_model iirc. In comparison to everything else however, matplotlib isn't a really heavy dependency.

I completely understand the desire to have less dependencies, installing this package alone inflates an environment quite a bit.

What I have thought about previously is splitting it into different portions:

Even with those splits however, a lot of dependencies are shared between those three categories.

NicolasMICAUX commented 1 year ago

Ok, nice that you've already thought about it.

NicolasMICAUX commented 1 year ago

Thanks for giving this little library a go. I think outside of myself, you are the first person to have done so.

Side question, but do you think you're going to somewhat maintain it in the near future (1-2 years)? I really want to use speakerbox as I found it well made and very useful, but I don't wan't to raise issues and PR often if you had not planned to maintain this lib outside of your own use.

evamaxfield commented 1 year ago

Side question, but do you think you're going to somewhat maintain it in the near future (1-2 years)? I really want to use speakerbox as I found it well made and very useful, but I don't wan't to raise issues and PR often if you had not planned to maintain this lib outside of your own use.

That is generally the plan! It depends on how much time it takes to support it. Council Data Project (the org this project was developed in use for) is relying on it for both future development and research so it will be maintained at least for our own use cases. As per usual with an open-source project, if others want to help maintain or add features, I'm happy to accept PRs.