Closed MridulS closed 1 year ago
I went through all the imports inside the codebase and bigbang seems to depends on these libraries:
numpy
networkx
requests
pandas
bs4
chardet
python-dateutil
html2text
pytz
tqdm
pyyaml
certifi
levenshtein
ietfdata
validator-collection
There are other dependencies:
python-docx
GitPython
nbformat
nbconvert
which could be turned into optional dependencies as they are used at very specific places. This can make the initial install quicker for users. I think the docx and git ones are pretty bulky installs.
Hi @MridulS Thanks so much for all of this....
One question. Historically, we've supported both pip and conda installation paths: https://bigbang-py.readthedocs.io/en/latest/installation.html
I think the main reason why we have both an environment.yml and requirements.txt file is because we wanted to support both installation paths. At the time when this was set up (five years ago or so), there wasn't a way to declare requirements for both systems simultaneously (or else I didn't know it).
Is there a way to consolidate the requirements into one configuration file that would work with both conda and pip? If so, we should switch to whatever that is.
I agree with what you are saying about trimming the dependencies for the python package to what is necessary for the library.
Things have been awkwardly designed because for 90% of users thus far, the repository has been used like this:
So the pypi package has been secondary to this hackier use case.
I'm quite glad we are moving to a different, more standard paradigm!
I do think we may need to have some documentation that supports the older use case. Which maybe just means loading optional dependencies so that people have jupyter installed.
What do you think?
One question. Historically, we've supported both pip and conda installation paths: https://bigbang-py.readthedocs.io/en/latest/installation.html
We can have a conda installation path available just by having a clean pip install. I will submit https://github.com/MridulS/staged-recipes/tree/bigbangpy_on_conda_forge to conda-forge once we make a new release on pypi. Then there is a bot setup which will auto release new conda packages built by using the pip package. So we only need to worry about the pip packages.
I do think we may need to have some documentation that supports the older use case. Which maybe just means loading optional dependencies so that people have jupyter installed. What do you think?
Yes definitely, with a lean pip/conda install-able bigbang-py
package we should be able to create good documentation for different workflows. The optional dependencies bits can be documented (we can also just add them to the normal deps list if required)
Oh, I see. We wouldn't be using conda for the development instalaltion, and we would have clean development and 'user' use cases. That's much better. Great. Thanks.
related to https://github.com/datactive/bigbang/issues/576
There are multiple files/places in the repository that could be treated as the required requirements for the bigbang package.
Ideally there should be only place and with the least amount of requirements. Like
jupyter
shouldn't be a requirement as there is nothing in the codebase of the bigbang library that requiresjupyter
to work. The notebooks requiresjupyter
but that shouldn't be part of bigbang-as-a-package requirements.