Multiple conflicting required requirements

datactive / bigbang

Scientific analysis of collaborative communities

http://datactive.github.io/bigbang/

MIT License

152 stars 52 forks source link

Multiple conflicting required requirements #578

Closed MridulS closed 1 year ago

MridulS commented 1 year ago

There are multiple files/places in the repository that could be treated as the required requirements for the bigbang package.

Ideally there should be only place and with the least amount of requirements. Like jupyter shouldn't be a requirement as there is nothing in the codebase of the bigbang library that requires jupyter to work. The notebooks requires jupyter but that shouldn't be part of bigbang-as-a-package requirements.

MridulS commented 1 year ago

I went through all the imports inside the codebase and bigbang seems to depends on these libraries:

 numpy
 networkx
 requests
 pandas
 bs4
 chardet
 python-dateutil
 html2text
 pytz
 tqdm
 pyyaml
 certifi
 levenshtein
 ietfdata
 validator-collection

There are other dependencies:

 python-docx
 GitPython
 nbformat
 nbconvert

which could be turned into optional dependencies as they are used at very specific places. This can make the initial install quicker for users. I think the docx and git ones are pretty bulky installs.

sbenthall commented 1 year ago

Hi @MridulS Thanks so much for all of this....

One question. Historically, we've supported both pip and conda installation paths: https://bigbang-py.readthedocs.io/en/latest/installation.html

I think the main reason why we have both an environment.yml and requirements.txt file is because we wanted to support both installation paths. At the time when this was set up (five years ago or so), there wasn't a way to declare requirements for both systems simultaneously (or else I didn't know it).

Is there a way to consolidate the requirements into one configuration file that would work with both conda and pip? If so, we should switch to whatever that is.

sbenthall commented 1 year ago

I agree with what you are saying about trimming the dependencies for the python package to what is necessary for the library.

Things have been awkwardly designed because for 90% of users thus far, the repository has been used like this:

Clone the repo and create a local installation
Run the scripts to pull data from a remote source like IETF
Open the notebooks, load the data into memory, and play around with scipy commands to make nice plots

So the pypi package has been secondary to this hackier use case.

I'm quite glad we are moving to a different, more standard paradigm!

I do think we may need to have some documentation that supports the older use case. Which maybe just means loading optional dependencies so that people have jupyter installed.

What do you think?

MridulS commented 1 year ago

One question. Historically, we've supported both pip and conda installation paths: https://bigbang-py.readthedocs.io/en/latest/installation.html

We can have a conda installation path available just by having a clean pip install. I will submit https://github.com/MridulS/staged-recipes/tree/bigbangpy_on_conda_forge to conda-forge once we make a new release on pypi. Then there is a bot setup which will auto release new conda packages built by using the pip package. So we only need to worry about the pip packages.

I do think we may need to have some documentation that supports the older use case. Which maybe just means loading optional dependencies so that people have jupyter installed. What do you think?

Yes definitely, with a lean pip/conda install-able bigbang-py package we should be able to create good documentation for different workflows. The optional dependencies bits can be documented (we can also just add them to the normal deps list if required)

sbenthall commented 1 year ago

Oh, I see. We wouldn't be using conda for the development instalaltion, and we would have clean development and 'user' use cases. That's much better. Great. Thanks.