Open jsstevenson opened 7 months ago
@ga4gh/vrs-python-maintainers @ga4gh/vr-maintainers should provide thoughts
- Rename "extras" to something more expressive ("tools" or something?)
I am pro more meaningful name. Tools is fine to me
- Merge the extra dependencies into the main dependencies
- Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)
I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight
I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight
Yeah I agree that simply merging the dependencies would cause some unnecessary difficulties for some niche but not infrequent use cases. I guess the question is
1) if we keep in same installable, is it weird to have the requirements for the most common use cases live under an optional dependency group?
2) if we separate out into different packages, will it suck to maintain a models/core/serialization package separately from a tools (translator, annotator) package?
final note: I noticed a few days ago that NetworkX is doing something like this -- you can theoretically install it without Numpy, etc but will hit import errors very quickly, so they recommend you just install with the optional [default]
dependency group: https://networkx.org/documentation/stable/install.html#install-the-released-version
I think it's okay to have a frequently used optional dependencies target. The core library functionality of the models and serialization stuff should be usable without those. Not being able to normalize alleles without it is a bit of a limitation.
If there wasn't such a size difference in the installed size between them I'd say it wouldn't be that big of a deal to merge them. Adding the extras
adds a lot (though I would expect most users to add them).
docker run -it --rm python:3.11 bash -c 'du -hs /usr/local/lib/python3.11/site-packages'
13M /usr/local/lib/python3.11/site-packages
docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'
34M /usr/local/lib/python3.11/site-packages
docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs[extras] @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'
182M /usr/local/lib/python3.11/site-packages
@theferrit32 ~40MB of that is PySam (by way of HGVS -> SeqRepo -> PySam), which maybe explains why the DataProxy
stuff is reimplemented here instead of being imported from SeqRepo...
I'm not sure if a firm decision has been made on this but I've heard a few thoughts expressed on the matter:
edit: had another thought to toss out: move
extras
to a different module level, a la