camenduru / text-generation-webui-colab

A colab gradio web UI for running Large Language Models
The Unlicense
2.07k stars 367 forks source link

Unable to run the API extention #34

Closed avinashkr29 closed 1 year ago

avinashkr29 commented 1 year ago

After checking the "api" option under the Session tab, I clicked the "Apply flags/extension and Restart" button as shown below:

Screenshot 2023-09-07 at 1 53 03

This generated the following logs in the colab console:

> 2023-09-06 16:30:28 WARNING:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
2023-09-06 16:30:28 INFO:Loaded the model in 51.97 seconds.

2023-09-06 16:30:28 INFO:Loading the extension "gallery"...
Running on local URL:  http://127.0.0.1:7860/
Running on public URL: https://<my_old_live_link>/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

---------------------------------<Below is the log after I restarted the the server with api option>---------------------------

ERROR:    Exception in ASGI application

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/websockets/websockets_impl.py", line 247, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 149, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 75, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 341, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 82, in app
    await func(session)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 289, in app
    await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 536, in join_queue
    session_info = await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/dist-packages/starlette/websockets.py", line 133, in receive_json
    self._raise_on_disconnect(message)
  File "/usr/local/lib/python3.10/dist-packages/starlette/websockets.py", line 105, in _raise_on_disconnect
    raise WebSocketDisconnect(message["code"])
starlette.websockets.WebSocketDisconnect: 1012
Closing server running on port: 7860
2023-09-06 16:31:32 INFO:Loading the extension "gallery"...
2023-09-06 16:31:32 ERROR:Failed to load the extension "api".
Traceback (most recent call last):
  File "/content/text-generation-webui/modules/extensions.py", line 40, in load_extensions
    extension.setup()
  File "/content/text-generation-webui/extensions/api/script.py", line 10, in setup
    if shared.public_api:
AttributeError: module 'modules.shared' has no attribute 'public_api'
Starting API at http://127.0.0.1:5000/api
Running on local URL:  http://127.0.0.1:7860/
Running on public URL: https://<my_new_live_link>/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Output generated in 7.95 seconds (4.90 tokens/s, 39 tokens, context 45, seed 932200172)

I tried the following code to get the response after that. However, I am getting 404 error. Could you please tell me how do I start the API correctly and get the responses?



# For local streaming, the websockets are hosted without ssl - http://
HOST = '<my_new_live_link>'
URI = f'https://{HOST}/api/v1/generate'

# For reverse-proxied streaming, the remote will likely host with ssl - https://
# URI = 'https://your-uri-here.trycloudflare.com/api/v1/generate'

def run(prompt):
    request = {
        'prompt': prompt,
        'max_new_tokens': 250,
        'auto_max_new_tokens': False,
        'max_tokens_second': 0,

        # Generation params. If 'preset' is set to different than 'None', the values
        # in presets/preset-name.yaml are used instead of the individual numbers.
        'preset': 'None',
        'do_sample': True,
        'temperature': 0.7,
        'top_p': 0.1,
        'typical_p': 1,
        'epsilon_cutoff': 0,  # In units of 1e-4
        'eta_cutoff': 0,  # In units of 1e-4
        'tfs': 1,
        'top_a': 0,
        'repetition_penalty': 1.18,
        'repetition_penalty_range': 0,
        'top_k': 40,
        'min_length': 0,
        'no_repeat_ngram_size': 0,
        'num_beams': 1,
        'penalty_alpha': 0,
        'length_penalty': 1,
        'early_stopping': False,
        'mirostat_mode': 0,
        'mirostat_tau': 5,
        'mirostat_eta': 0.1,
        'guidance_scale': 1,
        'negative_prompt': '',

        'seed': -1,
        'add_bos_token': True,
        'truncation_length': 2048,
        'ban_eos_token': False,
        'skip_special_tokens': True,
        'stopping_strings': []
    }
    print(URI)
    response = requests.post(URI, json=request)
    print(response)

    if response.status_code == 200:
        result = response.json()['results'][0]['text']
        print(prompt + result)

if __name__ == '__main__':
    prompt = "In order to make homemade bread, follow these steps:\n1)"
    run(prompt)```
avinashkr29 commented 1 year ago

I was able to solve the issue by changing the final command in colab to the following:

!pip install flask-cloudflared
!python server.py --api --public-api --share --settings /content/settings.yaml --wbits 4 --groupsize 128 --loader AutoGPTQ --model /content/text-generation-webui/models/vicuna-13b-GPTQ-4bit-128g