Open taoari opened 1 year ago
Hi @taoari, I've started work on this here, but haven't yet added it to the documentation. This hasn't been tested yet—there is a chance it already works, but might require a bit of debugging.
Usage would look like this:
from continuedev.src.continuedev.libs.llm.ht_tgi import HuggingFaceTGI
...
config=ContinueConfig(
...
models=Models(
default=HuggingFaceTGI(server_url="<SERVER_URL>")
)
)
I encountered friction installing TGI on my mac, which is why I haven't fully tested yet, so would be super helpful for me if you wanted to give it a try
@sestinj I got the following error
ModuleNotFoundError: No module named 'continuedev.src.continuedev.libs.llm.ht_tgi'
Just a typo, it should be hf_tgi
Can check the file here to be sure: https://github.com/continuedev/continue/blob/main/continuedev/src/continuedev/libs/llm/hf_tgi.py
@sestinj No errors this time. But it still does not work. The "Play" button blinks all the time, and get no response.
@sestinj I think it crashed on my computer.
I did uninstall, lsof -i :65432 | grep "(LISTEN)" | awk '{print $2}' | xargs kill -9 delete ~/.continue reinstall
It still does not work, I always got "Continue Server Starting".
Ok, I might just need to go back and test this myself then. I'll update you when it's ready.
Is Continue completely unable to start up again? In worst case I think that uninstalling Continue and restarting VS Code should solve things.
Another way to make sure that no servers are running is just lsof -i :65432
You can check the logs with cmd+shift+p "View Continue Server Logs"
I set up a local instance of TGI and added it in config.py
as follows:
from continuedev.src.continuedev.libs.llm.ht_tgi import HuggingFaceTGI
...
config=ContinueConfig(
...
models=Models(
default=HuggingFaceTGI(server_url="http://localhost:8080")
)
)
Please note, I am able to successfully obtain responses from /info
, /generate
and /generate_stream
endpoints of TGI.
If I type a simple prompt in Continue
box, I get the following error:
Traceback (most recent call last):
File "continuedev/src/continuedev/libs/util/create_async_task.py", line 21, in callback
future.result()
File "asyncio/futures.py", line 203, in result
File "asyncio/tasks.py", line 267, in __step
File "continuedev/src/continuedev/core/autopilot.py", line 543, in create_title
title = await self.continue_sdk.models.medium.complete(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 258, in complete
completion = await self._complete(prompt=prompt, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 334, in _complete
async for chunk in self._stream_complete(prompt=prompt, options=options):
File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEI1quhS4/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 55, in _stream_complete
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "json/__init__.py", line 346, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
@abhinavkulkarni I've just released a new version that I think will fix this. It was a very obvious mistake on our end
Thanks @sestinj, I now get a new error:
Traceback (most recent call last):
File "continuedev/src/continuedev/libs/util/create_async_task.py", line 21, in callback
future.result()
File "asyncio/futures.py", line 203, in result
File "asyncio/tasks.py", line 267, in __step
File "continuedev/src/continuedev/core/autopilot.py", line 543, in create_title
title = await self.continue_sdk.models.medium.complete(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 258, in complete
completion = await self._complete(prompt=prompt, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 334, in _complete
async for chunk in self._stream_complete(prompt=prompt, options=options):
File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEISzqrqf/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 41, in _stream_complete
args = self.collect_args(options)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEISzqrqf/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 37, in collect_args
args.pop("functions")
KeyError: 'functions'
If I comment this line out, I get Error parsing JSON: Expecting value: line 1 column 1 (char 0)
error.
Please note, I get a successful response from my local TGI setup:
$ curl http://localhost:8080/generate -X POST -d '{"inputs":"Write a hello world Python program","parameters":{"max_n
ew_tokens":512}}' -H 'Content-Type: application/json' | jq ".generated_text" -rc | cat
def main():
print("Hello World")
if __name__ == "__main__":
main()
The "functions" error is an easy one. Let me give the other a deeper look and set TGI up on my own machine (embarrassing, but I haven't gotten to this yet, I was just following the API documentation). I think it might be something about how I'm calling the streaming endpoint.
The request I'm making right now is the equivalent of
curl -X POST -H "Content-Type: application/json" -d '{"inputs": "<prompt_value>", "parameters": {"max_new_tokens": 1024}}' http://localhost:8080/generate_stream
resuming work in the morning, has been a slight pain to setup TGI on Mac.
If there's any chance you've seen this error would be curious how you solved it. Otherwise sure I'll get it tmr
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
@abhinavkulkarni finally got it, and successfully tested on my own TGI setup. Let me know if any problems still
and @taoari this should solve your error as well
Thanks @sestinj, local TGI setup works and I can generate responses from it.
However, I am not able to feed it context by selecting code, please see the attached video. You can see responses being generated from the TGI in the integrated terminal window. Please note, when I switch to OpenAI maybe proxy, it does work and is able to answer questions based on the highlighted context.
Also, for Llama 2 models, </s>
is a special token that indicates end of text/sequence and should not be displayed.
You can see in the following attached image, that is is shown for the title.
@abhinavkulkarni I have a suspicion that the code is in the prompt, but the model is ignoring it. If you try this again and hover over the response, a magnifying glass button will show up. Clicking that shows the full prompts/completions as sent to the LLM. Could you share what that looks like?
We have a stop parameter that can be set for the model, but since CodeLlama/Llama is usually the model people use, I think it would be sensible to have \ as the default there. Also noticing the [PYTHON] tags are probably a bit annoying. I'll make a change so they are converted to triple backticks
Thanks, @sestinj, here's a video screengrab for a simple prompt. This is the full prompt and the response:
This is a log of the prompt/completion pairs sent/received from the LLM during this step
############################################
Prompt:
[INST] Tell me what this code is doing.
[/INST]
############################################
Completion:
This code is using the `requests` library to make a GET request to the URL `https://api.github.com/users/octocat/repos`. The `json()` method is used to parse the response as JSON data, and the `for` loop is used to iterate over the list of repositories returned in the response.
For each repository, the code is printing the repository name and the number of stars it has. The `print()` function is used to display the output.
This code is using the GitHub API to retrieve a list of repositories for the user "octocat" and then printing the name and number of stars for each repository..</s>
############################################
Prompt:
[INST] " This code is using the `requests` library to make a GET request to the URL `https://api.github.com/users/octocat/repos`. The `json()` method is used to parse the response as JSON data, and the `for` loop is used to iterate over the list of repositories returned in the response.
For each repository, the code is printing the repository name and the number of stars it has. The `print()` function is used to display the output.
This code is using the GitHub API to retrieve a list of repositories for the user "octocat" and then printing the name and number of stars for each repository..</s>"
Please write a short title summarizing the message quoted above. Use no more than 10 words:
[/INST]
############################################
Completion:
""The only way to do great work is to love what you do." ― Steve Job
Thanks. My suspicion is wrong... but...I see the problem! This is actually fixable through the config file, but I'll change the default to be the correct thing and push a new version soon
There is a template_messages
property of all LLM classes that converts chat history into a templated prompt, and the function I have as the default for HuggingFaceTGI is cutting out the chat history. The correct thing would look like this:
from continuedev.src.continuedev.libs.llm.prompts.chat import llama2_template_messages
...
...
default=HuggingFaceTGI(..., template_messages=llama2_template_messages)
@abhinavkulkarni just released a new version, this is now the default so now highlighted code will be included
Thanks, @sestinj, things work perfectly now, except for one small detail. The title generated seems to be random and has nothing to do with the prompt. I am attaching an example screengrab here.
Also attaching all the prompt/completion pairs.
Which model are you using? I can then just test out the exact prompt here until I find something more reliable
The prompt looks ok, other than the end token. Adding a stop_tokens
option in the LLM class and there's a small chance that fixes it, but likely not
Also relevant for now might be the "disable_summaries" option in config.py depending on how bad it is: https://continue.dev/docs/reference/config#:~:text=token%20is%20provided.-,disable_summaries,-(boolean)%20%3D%20False
Hey @sestinj,
Also relevant for now might be the "disable_summaries" option in config.py
Thanks, that works.
Which model are you using?
I am using a 4-bit AWQ quantized version of codellama/CodeLlama-7b-Instruct-hf
, but you won't be able to run it on CPU (I read it in one of your previous replies that you were running these on a Mac). If so, you may want to test it with a 4-bit GGML/GGUF version of this model to see if you too are getting random quotes as titles.
Another problem I have observed is that the last character in the completion tends to be repeated - so if it is a period or an exclamation mark, it is repeated. If I feed the same prompt to my local TGI using curl
, I don't get this repetition.
Here's the screengrab attached:
Ok, cool. I'll see what I can find. Seems like Continue is just extra excited lol !!
@abhinavkulkarni just wanted to update you on this since I know it's been a while - I've been planning on potentially using LiteLLM to make API calls to different providers, such as HuggingFace TGI, and this would solve the above problem, so I've decided to postpone digging into it myself. I'll let you know as soon as there's an update here!
Also thought you might want to know this since I talked to them and they mentioned that you were a contributor : )
👋 @abhinavkulkarni
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
HuggingFace TGI is a standard way to serve LLMs. Is it possible to add support for HuggingFace TGI served codellama models?
Describe the solution you'd like A clear and concise description of what you want to happen.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.