allenai / dolma

Data and tools for generating and inspecting OLMo pre-training data.
https://allenai.github.io/dolma/
Apache License 2.0
972 stars 107 forks source link

Add requirements.txt? #211

Closed revbucket closed 1 month ago

revbucket commented 1 month ago

Would it be possible to include a requirements.txt to make for faster invocations of pip install dolma?

When I spin up a new machine and try to install Dolma from a clean python environment, there are many packages that need to have multiple versions downloaded to determine compatibility with the other requirements:

e.g., I get many messages in pip like INFO: pip is looking at multiple versions of <PACKAGE> to determine which version is compatible with other requirements. This could take a while.

And even the dreaded:

INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking

I'm pretty sure all of this could be ameliorated with a requirements.txt

revbucket commented 1 month ago

Okay, made a PR with a requirements.txt

I suppose a better option would be to additionally update the pyproject.toml, but I really just wanted a requirements file.