Open BoQsc opened 1 year ago
In the README file it is shown that each prompt is taken as argument of the program. Interactive Mode is when you interact using prompts in a chat-like behaviour.
Right, like: you get a prompt cursor and start chatting. I’d be in favor so we avoid loading repeatedly when testing multiple prompts interactively.
@BoQsc is this something you’d have time to contribute?
This would be really cool. I was checking termgpt and we can take inspiration from there. I'd love to work on this as a fun project. Let me know @BoQsc or anyone from community wants to collaborate 😄
A simple while loop with reading input like this
while True:
prompt = input("Prompt:")
could already be an acceptable minimal version. I wouldn't go much further than that for the simple demo script unless there is good value. termgpt uses rich to format the output with colors and so on.
Hi, I started playing yesterday with it. As @awaelchli mentioned, that snippet does the job once you load the model.
However, a cool step would be to move towards a chatbot assistant.
Currently the prompt does not contain the past conversation and as such the model cannot reply to questions like "What was the previous question I asked you?", so some way of concatenating all the context of the conversation should be adopted. I tried with the 7B version fine-tuned with the finetune_lora.py script and the problem with that is that the instructions in the fine-tuning stage never contain multiple steps of dialogue. This might result in the model continuing with multiple steps of dialogue in which it also tries to predict the next prompt of the user and so on...
I write this just to say that possible scripts that we could work on are:
Yes it would be great
It would be cool to use Textual for the UI https://www.textualize.io/#textual
Thank you above for some directions. I guess modifying the code in generate_adapter.py like this will work for simple one-step interact mode?
Also, I guess will need to leverage something like ShareGPT data to finetune with multiple steps of dialogue?
generate_adapter.py
......
tokenizer = Tokenizer(tokenizer_path)
while True:
prompt = input(">> Prompt:")
if not prompt:
break
sample = {"instruction": prompt, "input": input_instruction}
prompt = generate_prompt(sample)
encoded = tokenizer.encode(prompt, bos=True, eos=False, device=model.device)
print("Inferencing...")
t0 = time.perf_counter()
output = generate(
model,
idx=encoded,
max_seq_length=max_new_tokens,
max_new_tokens=max_new_tokens,
temperature=temperature,
top_k=top_k,
eos_id=tokenizer.eos_id
)
output = tokenizer.decode(output)
output = output.split("### Response:")[1].strip()
print(">> lit-llama: ", output)
t = time.perf_counter() - t0
print(f"\nTime for inference: {t:.02f} sec total, {max_new_tokens / t:.02f} tokens/sec", file=sys.stderr)
print(f"Memory used: {torch.cuda.max_memory_reserved() / 1e9:.02f} GB", file=sys.stderr)
print("\n")
......
very cool @chakt, wanna open a PR?
I implemented one in https://github.com/Lightning-AI/lit-stablelm/blob/main/chat.py. It could be copied over to this repository.
Just clone the code from lit-parrot chat.py into lit-llama/generate.py ... that will give you an interactive mode.
Can we make it conversation style where it remembers the context from previous prompts. That would be more helpful.
Hey @BoQsc Could clarify a bit more what you mean by interactive mode? Could you give an example?