Install Serge (LLaMA) and evaluate the service

aylabs / chatgpt

Experiments with ChatGPT

Apache License 2.0

0 stars 0 forks source link

Install Serge (LLaMA) and evaluate the service #8

Closed acs closed 1 year ago

acs commented 1 year ago

Let's follow https://github.com/nsarrazin/serge to have a local service based on LLaMa (optimized LLM). The goals are:

Understand the resources needed
Understand the conversations richness and accuracy, compared with ChatGPT3
Answser the question: are these optimized LLMs a real alternative?
Play with the remote API

acs commented 1 year ago

Nothing better than starting using the docker version.

docker run -d -v weights:/usr/src/app/weights -v datadb:/data/db/ -p 8008:8008 ghcr.io/nsarrazin/serge:latest Screenshot from 2023-04-01 08-09-05

It seems that now, we need to download the models. So the docker is just the client and the server (FastAPI) which will use the LLM for the conversations.

Screenshot from 2023-04-01 08-11-03

Screenshot from 2023-04-01 08-15-40

acs commented 1 year ago

The first try was disappointing.

The speed is also a bit slow (it is using the CPU): 10-30s to answer the questions.

You can not use GPU with it.

acs commented 1 year ago

Screenshot from 2023-04-01 08-27-58

acs commented 1 year ago

Let's try the 13B model!

Screenshot from 2023-04-01 08-37-42

Let's restart the container:

docker stop 47e1e4ca3cda
docker run -d -v weights:/usr/src/app/weights -v datadb:/data/db/ -p 8008:8008 ghcr.io/nsarrazin/serge:lates

Now it is working.

Screenshot from 2023-04-01 08-46-47

acs commented 1 year ago

About the API, it has the doc where FastAPI provides it: http://localhost:8008/api/docs

Screenshot from 2023-04-01 08-32-20

acs commented 1 year ago

The models downloaded are stored in weights:

root@alicita:/var/lib/docker/volumes# du -sh *
0   backingFsBlockDev
301M    datadb
24K metadata.db
12G weights

acs commented 1 year ago

And now, let's compare with the real ChatGPT:

Screenshot from 2023-04-01 08-45-19

acs commented 1 year ago

My impression is that LLaMA 7B an 13B models are no comparable with ChatGPT. And they are more dangerous according to their answers.

acs commented 1 year ago

Trying the 30B model ... it does not work!

acs commented 1 year ago

So it is time to play with other alternatives!