c0sogi / llama-api

An OpenAI-like LLaMA inference API
MIT License
111 stars 9 forks source link

Dependency solution #1

Closed c0sogi closed 1 year ago

c0sogi commented 1 year ago

This Pull Request (PR) incorporates various convenience improvements and code refactoring. The main changes are as follows:

  1. Dependencies are automatically installed. By providing the --install-pkgs option when running the server, not only the packages of this project but also the packages of all related repositories are installed. This process includes identifying the appropriate version of CUDA and installing the corresponding PyTorch, as well as the installation of TensorFlow. Please refer to the README for more details.

  2. The need to install the pytest package has been eliminated by performing unittest instead of pytest.

  3. The docker-compose file is configured to fetch the already built docker image from Dockerhub.

  4. The poetry dependency is included in pyproject.toml. However, it is not recommended to directly install dependencies using poetry. When running the server, the toml file is converted to a requirements.txt file, and the necessary packages are installed via pip install.

  5. There is no need to use semaphores because the concurrent use of the model is already limited through the scheduling of the process pool's workers. However, by using semaphores, a queue can be created to efficiently utilize the cache model left in the existing worker for the worker scheduler, so the feature has been retained.

    This PR has already passed all test suites in the Python 3.11, Windows 11, CUDA 12.2 environment and will be merged automatically after an appropriate code review.