VatsaDev / nanoChatGPT

nanogpt turned into a chat model
MIT License
61 stars 11 forks source link

Chat Bot Implementation Question #1

Closed BarelyFunctionalCode closed 1 month ago

BarelyFunctionalCode commented 3 months ago

Hey,

I've been playing around with the nanoGPT repo for a while now. I'm familiar with the basics of neural networks and LLMs but there's still a lot I've yet to learn.

I've been using nanoGPT as a base for a discord bot I have on my server. The goal is to have the model periodically reply to discord messages as if they were just another user in the server. I've had moderate success with this but I'm unclear about a few things and are trying to hone in the performance of my model.

I've created a fine-tuning dataset by scraping all of the existing messages off the discord server(few years worth of messages of a dozen or so people), cleaning them up, and splitting them into logical conversation "chunks" based on relative timestamps. I then use this dataset to fine-tune on the gpt2-medium model.

My questions are:

person 2: another message.

person 1: yet another message.

person 3: you guessed it, a message. <|endoftext|>


- The way I make the context for the prompt on the model inference is including up to x previous messages based on relative timestamps between messages, then appending `bot name:`. Is this the correct way to do this?
- What are things to look out for when trying to tune the hyperparameters on the config file?

Sorry for the long message, thanks for reading. Let me know what you think. 

Nick.
VatsaDev commented 1 month ago

Hi

All the changes here finetuning and data based

Hadn't thought much about multi-person chats, but the main thing would mostly just be about pausing at the right times/good data, for which you could use discord logs, just replace a random person (maybe the one with the most messages?) with the bot tag, and use that

I think c.ai has this feature, maybe look into that

VatsaDev commented 1 month ago

The way I make the context for the prompt on the model inference is including up to x previous messages based on relative timestamps between messages, then appending bot name:. Is this the correct way to do this?

Thats a possible method and probably works well, though you might need to expand this for large GC's

VatsaDev commented 1 month ago

What are things to look out for when trying to tune the hyperparameters on the config file?

just look at general HP tuning, test what seems to work for others or what you think might work, you get the hang of it overtime