Packaging for PyPi - Githubissues

jmilldotdev commented 2 years ago

Description of the feature request

Related to #29, it would be very nice if this were a lightweight package that were available to pip install.

Description of the solution you'd like

Proposed solution:

The core library could be shipped with just requests and tqdm as dependencies. This would mean removing dependence on the cohere and openai packages, and replacing their clients' get_request methods with requests HTTP calls, similar to ai21_client
The caches could either be included, or shipped separately (as manifest-caches or something) to avoid dependence on sqllitedict and redis, as well as allowing for additional backends
The API package could be released separately for those who want to use it (as I feel the most common production usecase by far will be using HTTP), this removes dependence on flask, torch, transformers, etc

I am happy to work on this if you think this is a good/useful architecture. Perhaps it's worth a fork of the library at that point since I'm sure a good amount of research code now depends on this package as-is.

jmilldotdev commented 2 years ago

https://github.com/jmilldotdev/manifesting/tree/refactor-for-pypi

i did a quick prototype of what that would look like here - it knocks the production dependencies down to just a handful. Removed the API, but left the caches in.

lorr1 commented 2 years ago

Hey! I'm so sorry for the delayed response. I didn't realize this issue was here (I forgot to watch my own repository 🤦 - I just fixed it).

This has been on my road map, but you're ideas are better than mine.

I really like moving to all requests and adding a standard timeout argument for all models.

I've starting working on a PR for this (taking a lot from your prototype - THANK YOU!!!). Need a bit more time to better handle request statuses and error. I'll remove dependencies on any of the API packages. I'll leave in sqlite and redis for now. I'll also add run_id flag to run so that a user can specify the run_id - thereby allowing for multiple cached reruns and the ability for the user to retrieve that rerun (if the run_id is random - it can't be retrieved)

The one thing I need to figure out is how to do optional pip installs. So what I'd like is for a user to say pip install manifest[api] and then get the transformers etc. But otherwise, the only dependencies are requests, tqdm, sqlite, and redis.

lorr1 commented 2 years ago

manifest is on pip

pip install manifest-ml

HazyResearch / manifest

Packaging for PyPi #34

Description of the feature request

Description of the solution you'd like