A local ChatGPT - Githubissues

veler commented 1 year ago

What feature or new tool do you think should be added to DevToys?

Tools like Copilot, ChatGPT or BingChat are truly helpful from a developer perspective.

Why do you think this is needed?

Having an optional local and offline ChatGPT-like experience in DevToys could be nice regarding privacy concerns.

Solution/Idea

It actually already exist: https://faraday.dev/

But we could make a DevToys plugin that would allow to do this, perhaps.

Comments

No response

lucgagan commented 1 year ago

This is cool!

Regenhardt commented 1 year ago

This would be amazing to have, even though I'd integrate it as optional somehow so DevToys doesn't immediately install a GPT model if never used and actually even more important, doesn't always load the model into memory as to keep DevToys without it usable on non-high end machines.

Gotta decide which model to use, there are quite a few available now.

Then maybe find out if there's a .NET project that makes integrating a GPT model easier, unless someone with some ML.NET experience can tell us it's not complicated to integrate anyway?

Possible libraries:

LlamaSharp (by SciSharp, llama.cpp bindings, includes ASP.NET Core integration)
BotSharp by SciSharp, bot framework around LlamaSharp, maybe this is already what we need and we just put a UI on it
llama.cpp-dotnet (bindings for llama.cpp, bert.cpp, and gglm.cpp, includes ASP.NET Core integration)

Then after that maybe even integrate with huggingface and enable choosing your own model, but gotta get it to work at all first.

Possible models:

Alpaca-7B (used by FreedomGPT)
Llama-7B (used by FreedomGPT)
Llama-2-Coder-7B (randomly found on huggingface)
Qwen-7B-Chat by Alibaba Cloud
OpenOrca-Preview1-13B-GGML
Code Llama Instruct 7B. 13B official Llama 2 code fine tuned by meta, this one even has C# in its description

If we add a backend to self-host on a server we can of course use bigger models too like:

StarCoder (15.5B parameters) by the BigCode Project
Llama-2-70B-chat-hf by Meta
Code Llama Instruct 34B by Meta

Feel free to suggest different ones or tell me why some of these are bad then I'll try to keep this list up to date (Edit: Keeping this up to date feels like running after a bunny, evrey time I think I'm there it goes another direction).

If we build a separate backend, might aswell check if there's already something we can use to host huggingface models that is a backend for their VSCode extension to offload potential backend/Webservice implementation to that.

Regenhardt commented 1 year ago

~Also I'm not entirely sure how well this feature would fit into the current DevToys context.~ _{Forget this why am I even arguing with the actual creator of DevToys}

It might be worth it developing it as its own library (minus UI stuff of course, as an actual library) and as next step think about how to integrate that into DevToys so it can be integrated into other applications too.

Regenhardt commented 1 year ago

API thoughts:

Build it on the whole Chat context, i.e. Chat.NET, GptChat.NET or something?
Transformers can do more than text generation so maybe not that.
Or could the library become more than text generation later? Once implemented, it might be quite easy to actually add text-to-image generation to such a library.
Microsoft recently published RetNet so maybe not Transformers/GPT in the general naming?
If it gets the feature of choosing a model from huggingface, maybe that, like Huggingface.NET, although it would be nice to ask huggingface.co before using their name so generally.

Could try to keep it way above 🤗Transformers abstraction level and just go

var chat = new LlmChat(dataDirectory); // check for existing models in the directory?  
chat.LoadModel(chosenModel); // Load from directory or download from huggingface? Or put the download as explicit step between these?

// This is prefixed in front of conversations. I've seen that chat works by prefixing text with who said it, not sure if this is used for all models.
chat.SetSystemPrompt("You're a helpful, creative thinking, precise AI coding assistant running on a local machine."); 

// Single conversation only by design?  
var output = chat.UserSays("Please build a leftPad function in typescript, make sure a string is passed");

// Or multiple conversations?  
var conversation = chat.NewConversation(); // Creates a new context where the conversation is a long string that now has the system prompt
var output = chat.UserSays("Please build a leftPad function in typescript, make sure a string is passed");

var gui = SendAnswer(output);
await saveConversation(); // the single one or the one we just created?
await gui;

Edit: What if I chose a small language model? Maybe not LlmChat. GenerativeModel? TextGeneration? AiChat?

veler commented 1 year ago

Love this! Thanks for all these ideas! I didn't look much into technical approach yet as this is not super high priority at the moment but I definitely appreciate what you're doing here 😁

Ultimately the goal isn't too compete with Faraday or Freedom GPT. The goal is to provide a quite minimal experience on top of the regular devtoys, through an extension (Devtoys 2.0 is extensible). Potentially, there could be some AI application to find in every tools, and this extension would be add these AI features to each of them. For example, it could potentially suggest an ideal color that meets the contrast requirements in the Color analyzer tool. Additionally, it could indeed provide it's own tool with a chat-like experience, running locally.

Performance is something that worries me a bit. I didn't try FreedomGPT yet. Faraday uses the CPU by default. Gpu support is a work in progress. Using CPU to run the AI is much slower (not surprising). I don't expect most of users to have a high end Gpu on their machine though. So even if devtoys approach supports Gpu, I expect most customers will face a very slow AI. What do you think?

Edit: I just tried FreedomGPT with LLAMA 7B FULL. Interestingly it failed to explain me a RegEx I gave. I admit my prompt could have been better but Llama 2 - Hermes 7B (Q6_K) on Faraday succeeded (although it took longer to answer).

Obviously there will be some work needed to find the best model that balances speed and accuracy of answers.

Regenhardt commented 1 year ago

It seems I broke llama 7b full on FreedomGPT, I just gave it this prompt:

format a double value `val` with two digits after the decimal, left padded to 10 characters, using string interpolation in C#.

And it just writes string.Format("0:0.000 and then forever zeros. It's been writing for a few minutes minutes and has way too many zeros already, not sure actually if I broke the model or the application:

It's at ~1800 zeroes now. We should definitely try to avoid this case, whatever it is.

Regenhardt commented 1 year ago

If similar models are on a similar level of performance this seems perfectly fine for smaller tasks that don't require a bigger conversation.
I tested on a Ryzen 2700 and it uses up to 5GB RAM, this would be totally acceptable for me. Even more so if it could use GPU, although I have a GTX 660TI from 2012 so I'm not sure if that could even work.

Regenhardt commented 1 year ago

Maybe we can use the lib of https://github.com/dranger003/llama.cpp-dotnet, given we actually get it to work as for some reason I can't build it and I'm not sure why.
I opened an issue hoping I get help, I never really used C++ so I don't know where to start really: https://github.com/dranger003/llama.cpp-dotnet/issues/8

veler commented 1 year ago

Hi there,

I did some experimentation today. I used https://github.com/SciSharp/LLamaSharp (have you played with it? Any opinion on this one?)

I used the model https://huggingface.co/TheBloke/OpenOrca-Preview1-13B-GGML. The reason why is that I played with Faraday.dev, which has a coding assistant that can answer to some programming questions. I admit I don't know much about this particular model and haven't try any other so far.

I configured the model to use the same parameters than this coding assistant, except that I changed the temperature.

I made a console app which has the only goal to explain a RegEx. Here is the result:

And here is the code:

using LLama;
using LLama.Common;

Console.WriteLine("THIS APP IS SPECIALIZED IN EXPLAINING REGEX FOR YOU");
Console.WriteLine();
Console.WriteLine("Type the path of to the GPT model:");
Console.ForegroundColor = ConsoleColor.Green;
string modelPath = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;

// Initialize a chat session
Console.ForegroundColor = ConsoleColor.DarkBlue;
var ex = new InteractiveExecutor(new LLamaModel(new ModelParams(modelPath, contextSize: 1024, seed: 1337, gpuLayerCount: 5)));
ChatSession session = new ChatSession(ex);
Console.ForegroundColor = ConsoleColor.White;

Console.WriteLine();
Console.WriteLine("Type the regex you want to get explained:");
Console.ForegroundColor = ConsoleColor.Green;
string regexToExplain = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
Console.WriteLine();
Console.WriteLine("-----------------------------------------------------");
Console.WriteLine();

var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "Explain this regular expression by breaking it down. Include a snippet of the original regex in your answer. Be concise. Do NOT repeat yourself");
chatHistory.AddMessage(AuthorRole.User, "^[a-zA-Z0-9_.+-][a-zA-Z0-9-]+[a-zA-Z0-9-.]+$");
chatHistory.AddMessage(AuthorRole.Assistant,
    "- `^`: Matches the start of the string.\r\n" +
    "- `[a-zA-Z0-9_.+-]+`: Matches one or more characters that are either a letter (lowercase or uppercase), a digit, an underscore, a period, a plus sign, or a hyphen.\r\n" +
    "- `[a-zA-Z0-9-]+`: Matches one or more characters that are either a letter (lowercase or uppercase), a digit, or a hyphen.\r\n" +
    "- `[a-zA-Z0-9-.]+`: Matches one or more characters that are either a letter (lowercase or uppercase), a digit, a period, or a hyphen.\r\n" +
    "- `$`: Matches the end of the string.");
chatHistory.AddMessage(AuthorRole.User, regexToExplain);

Console.ForegroundColor = ConsoleColor.Yellow;

var parameters = new InferenceParams()
{
    Temperature = 0f,
    TopP = 0.9f,
    TopK = 40,
    RepeatPenalty = 1.17647f,
    RepeatLastTokensCount = 256,
    Mirostat = MirostatType.Mirostat2,
    MirostatEta = 0.1f,
    MirostatTau = 3,
    AntiPrompts = new List<string> { "User:" }
};

foreach (var text in session.Chat(chatHistory, parameters))
{
    Console.Write(text);
}

Console.ReadLine();

It's not perfect. The AI seems to explain some things that don't even exist in the regex like (-{4}) or assumes we're talking about a URL (this regex is I asked to explain is supposed to match a GUID), but it's still pretty useful for a RegEx beginner I think (I find it useful for myself haha).

What do you think? Any suggestion on how to get a more accurate / precise result?

PS: I have a: AMD Ryzen 7 3700X 8-Core Processor, 3593 Mhz, 8 Core(s), 16 Logical Processor(s) 16 GB RAM

The answer to from this model on my machine takes ~15 seconds, then ~1min to display the full answer in the console.

If we decide to develop this feature for DevToys, I'd suggest letting the user decide between using a local, slow but offline AI, or Chat GPT with their own Open AI's API token, online, and faster.

Regenhardt commented 1 year ago

Nice, you make it look simple to implement. I added the possible libraries to my earlier overview comment.

I would however use something else than a regex for evaluation, since RegEx is by definition a regular language pattern and thus very easy to explain using algorithmic solutions, but very complicated for an AI model trained on and for natural language. There already many good tools for that.

I'd prefer to use translation of actual code, either from natural language to code or from code to natural, abstract language (not just directly translate code but understand what it's doing).
Although I guess there are a few more scientific tests to evaluate a model, I just don't know about what tests there are or what the results actually mean.

I agree with letting the user choose to use a remote AI, however I'd rather have the option to use a self-hosted one so we can keep the whole chain open source and enable companies (or people who have a server standing around, or play around in the cloud) to know where the data goes. I bet many companies outside the USA don't allow Copilot or ChatGPT as official tools because people would send sensitive data there.

This would of course mean building a seperate backend application suited for multi user... usage. Although we could of course send the whole conversation to the server and keep the backend stateless. Just have to figure out multi-threading for LLMs then.
Maybe start with local models?

Did you run your test on a GPU? Might run pretty nice on a GPU, which might be free to do work anyway for people not building video games/animations/GPU stuff.

veler commented 1 year ago

I agree with everything you said. RegEx can be parsed, so AI isn't necessary. A better scenario would be something to help the user generating a draft RegEx.

I did not try a GPU yet, because it was a tiny more complex to implement, and MacOS won't necessarily benefits of it (although I admit Apple Silicon are super powerful).

veler commented 1 year ago

Alright, I just tried using GPU. It is indeed much faster (and quieter, didn't the fan at all). The app starts to show an output after ~5 seconds, and the output speed is faster.

Regenhardt commented 1 year ago

I experimented a bit using Code Llama in LlamaSharp and...well it works, but not great. I'm not sure what to do about other parameters or the system prompt, setting the temperature to 0.2 made it just write newlines, not sure how to improve it:

Instruction: Write a C# extension method called on an HttpClient that is passed a URI, downloads the file and saves it in the current working directory.

    public static async Task DownloadFile(this HttpClient client, Uri uri)
{
    var fileName = string.IsNullOrWhiteSpace(uri.AbsolutePath) ? uri.GetFileName() : Path.GetFileName(uri.AbsolutePath));
    var file = await client.GetByteArrayAsync(uri);

    using (var stream = new FileStream(@".\downloads\" + fileName,FileMode.Create))

        await stream.WriteAsync(file,0,file.Length));

}

Instruction: uri.GetFileName() doesn't exist, correct your code. I apologize for the mistake in my previous response. Here is the corrected code:

using System; class Program {

static void Main(string[] args)
{

    Console.WriteLine("Enter a URL: ");

    string url = Console.ReadLine();

    Uri uri = new Uri(url);

    Console.WriteLine("The file name is: " + uri.GetFileName());

}



I hope this helps! Let me know if you have any questions or concerns about the code.

Regenhardt commented 12 months ago

Not sure what actually changed, but after using the GGUF version of CodeLlama, I get much better results:

grafik

veler commented 12 months ago

Wow, that looks promising! How fast was it to answer? Using cpu or gpu?

Regenhardt commented 12 months ago

I get maybe 2-3 words a second, after ~7 seconds on waiting until it starts generating. It runs on my CPU, so there's probably huge potential to even use bigger models than 7B.

veler commented 12 months ago

That's fantastic!

AmiK2001 commented 8 months ago

The ability to write custom llama or other text model based extensions for DevToys will be great. Can't wait an API for it to make some.

veler commented 4 months ago

Hey there, it's been a while! I just gave a try to Llama3-8B-Q4_K_M on my local Windows PC using CPU only (AMD Ryzen 7 3700X 8-Core Processor, 3593 Mhz, 8 Core(s), 16 Logical Processor(s)).

I'm impressed by how quick it answers! About 1.5 token/s. I asked it to explain a RegEx and it did it accurately. Sounds like shipping a local AI tool in DevToys could become a reasonable direction 👀

@btiteux FYI

I used https://github.com/SciSharp/LLamaSharp?tab=readme-ov-file With one of these models: https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main

DevToys-app / DevToys

A local ChatGPT #825

What feature or new tool do you think should be added to DevToys?

Why do you think this is needed?

Solution/Idea

Comments