deep-diver / LLM-As-Chatbot

LLM as a Chatbot Service
Apache License 2.0
3.29k stars 381 forks source link

bug in chatbot UI #32

Closed GeorvityLabs closed 1 year ago

GeorvityLabs commented 1 year ago

When i say "hello" , after telling the first sentence by text input box gets blocked with the loading animation. i'm not able to enter anything! i attached a screenshot below, any idea why this issue is there in chatbot UI?

also can you add "reset" button to reset chat?

Screenshot from 2023-03-28 22-50-28

GeorvityLabs commented 1 year ago

after like 5 or six questions it bugs out , any idea why this happens? Screenshot from 2023-03-28 22-55-12

GeorvityLabs commented 1 year ago

@deep-diver any idea why there are these bugs in the chatbot UI?

deep-diver commented 1 year ago

Sorry about that. Probably I need to understand Gradio better!

Will have a look into this case. Thanks for letting me know!

By the way I am hosting this in 3*A6000 now. Please try if you are interested

https://notebooksf.jarvislabs.ai/BuOu_VbEuUHb09VEVHhfnFq4-PMhBRVCcfHBRCOrq7c4O9GI4dIGoidvNf76UsRL/

GeorvityLabs commented 1 year ago

@deep-diver is the current colab example in batch mode or streaming mode? is there a difference in code. if the colab example is in batch mode, how can it be converted to streaming mode?

deep-diver commented 1 year ago

Default mode is streaming. If you want batch mode set --batch_size higher than 1

GeorvityLabs commented 1 year ago

https://user-images.githubusercontent.com/58555432/228458518-16555b18-39b9-4449-bad0-5d44fc6701ef.mp4

as you can see in the video above, the input box is blocked by the "loading animation" , any idea why this happens?

GeorvityLabs commented 1 year ago

https://user-images.githubusercontent.com/58555432/228459600-2a1f9d85-f19c-4cc2-a7ec-e53751cb9253.mp4

this is another UI bug where context box too is filled with the loading animation , any way to disable the loading animation @deep-diver also no response is obtained from model , even after waiting for minutes.

deep-diver commented 1 year ago

Are you using Colab? That happens. It looks like connection is not stable in Colab

GeorvityLabs commented 1 year ago

Are you using Colab? That happens. It looks like connection is not stable in Colab

not, colab. i am using on my local machine. is there any way to fix this? if it is connection issue

GeorvityLabs commented 1 year ago

but i did see same errors while using colab as well @deep-diver

GeorvityLabs commented 1 year ago

and i also sometimes saw the same issues on jarvis labs ai page as well @deep-diver

GeorvityLabs commented 1 year ago

@deep-diver any idea on how to achieve proper functioning inside colab? where you able to run tests and check if the results were stable inside colab. i think the instabilities in the UI translate to when we run on local machine too. some instability in the chatbot UI.

deep-diver commented 1 year ago

I just added cancel button

GeorvityLabs commented 1 year ago

ok that is great @deep-diver , i will run some tests and check how it functions now.

GeorvityLabs commented 1 year ago

I just added cancel button

@deep-diver can you also add a reset button? so that we can reset the chat , similar to how bing chat has reset option

deep-diver commented 1 year ago

Sounds good! Will try

GeorvityLabs commented 1 year ago

@deep-diver any update on the reset button? similar to chatgpt , to reset the chat/start new conversation.

deep-diver commented 1 year ago

Not yet

Sorry about that. I am currently busy at experimenting 65b model

deep-diver commented 1 year ago

I am thinking about having history tab instead of reset. Like you login with google account

GeorvityLabs commented 1 year ago

@deep-diver is there a way to measure the inference speed, like how many tokens/sec on a given gpu for a given model.

deep-diver commented 1 year ago

you can write your own code to simply count how many tokens are yielded for a certain amount of time window

deep-diver commented 1 year ago

just added reset button

GeorvityLabs commented 1 year ago

@deep-diver which dataset did you use to train this model : chansung/gpt4-alpaca-lora-13b ? is the dataset available on huggingface?

deep-diver commented 1 year ago

Used GPT4 generated dataset introduced in the "Instruction Tuning with GPT-4" paper. You can find out the dataset in the official repo

GeorvityLabs commented 1 year ago

@deep-diver , I have a GPU with 40GB of VRAM. If i run LLM-As-Chatbot with Llama 7B and AlpacaGPT4 Lora , how many instances can I run in parallel on a single GPU?

I don't want to make any queues, I want to check how many instances I can run on a single GPU at a time in parallel.

Hope you could shed some light onto this.

GeorvityLabs commented 1 year ago

Also @deep-diver , I'm currently running multiple instances in parallel on a single GPU by creating seperate docker containers for each instance of the chatbot (that way i get unique gradio links for each instance) , is there a better way to run multiple LLM-As-Chatbot instances in parallel on a single GPU?

deep-diver commented 1 year ago

@GeorvityLabs

sorry, I am not sure about this question. Even if you containerize, I/O blocking should be still there because there is a single GPU globally available. Better solution would logically isolate a single GPU as if there are multiple physical GPUs.

I am going to close this issue for now. Please put anything that you are wondering in the Discussion menu. I think that is a better place :)