Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
5.96k stars 517 forks source link

Is there an interactive mode? #79

Open BoQsc opened 1 year ago

awaelchli commented 1 year ago

Hey @BoQsc Could clarify a bit more what you mean by interactive mode? Could you give an example?

BoQsc commented 1 year ago

In the README file it is shown that each prompt is taken as argument of the program. Interactive Mode is when you interact using prompts in a chat-like behaviour.

Current non-interactive mode presented in the README.md

68747470733a2f2f706c2d7075626c69632d646174612e73332e616d617a6f6e6177732e636f6d2f6173736574735f6c696768746e696e672f4c6c616d615f70696e656170706c652e676966

lantiga commented 1 year ago

Right, like: you get a prompt cursor and start chatting. I’d be in favor so we avoid loading repeatedly when testing multiple prompts interactively.

@BoQsc is this something you’d have time to contribute?

aniketmaurya commented 1 year ago

This would be really cool. I was checking termgpt and we can take inspiration from there. I'd love to work on this as a fun project. Let me know @BoQsc or anyone from community wants to collaborate 😄

awaelchli commented 1 year ago

A simple while loop with reading input like this

while True:
    prompt = input("Prompt:") 

could already be an acceptable minimal version. I wouldn't go much further than that for the simple demo script unless there is good value. termgpt uses rich to format the output with colors and so on.

nicoladainese96 commented 1 year ago

Hi, I started playing yesterday with it. As @awaelchli mentioned, that snippet does the job once you load the model.

However, a cool step would be to move towards a chatbot assistant.

Currently the prompt does not contain the past conversation and as such the model cannot reply to questions like "What was the previous question I asked you?", so some way of concatenating all the context of the conversation should be adopted. I tried with the 7B version fine-tuned with the finetune_lora.py script and the problem with that is that the instructions in the fine-tuning stage never contain multiple steps of dialogue. This might result in the model continuing with multiple steps of dialogue in which it also tries to predict the next prompt of the user and so on...

I write this just to say that possible scripts that we could work on are:

lantiga commented 1 year ago

Yes it would be great

It would be cool to use Textual for the UI https://www.textualize.io/#textual

chakt commented 1 year ago

Thank you above for some directions. I guess modifying the code in generate_adapter.py like this will work for simple one-step interact mode?

Also, I guess will need to leverage something like ShareGPT data to finetune with multiple steps of dialogue?

generate_adapter.py

......

tokenizer = Tokenizer(tokenizer_path)

    while True:

        prompt = input(">> Prompt:")
        if not prompt:
            break

        sample = {"instruction": prompt, "input": input_instruction}
        prompt = generate_prompt(sample)
        encoded = tokenizer.encode(prompt, bos=True, eos=False, device=model.device)

        print("Inferencing...")

        t0 = time.perf_counter()
        output = generate(
            model,
            idx=encoded,
            max_seq_length=max_new_tokens,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            eos_id=tokenizer.eos_id
        )

        output = tokenizer.decode(output)
        output = output.split("### Response:")[1].strip()
        print(">> lit-llama: ", output)

        t = time.perf_counter() - t0

        print(f"\nTime for inference: {t:.02f} sec total, {max_new_tokens / t:.02f} tokens/sec", file=sys.stderr)
        print(f"Memory used: {torch.cuda.max_memory_reserved() / 1e9:.02f} GB", file=sys.stderr)
        print("\n")

......
aniketmaurya commented 1 year ago

very cool @chakt, wanna open a PR?

carmocca commented 1 year ago

I implemented one in https://github.com/Lightning-AI/lit-stablelm/blob/main/chat.py. It could be copied over to this repository.

RDouglasSharp commented 1 year ago

Just clone the code from lit-parrot chat.py into lit-llama/generate.py ... that will give you an interactive mode.

Harsh-raj commented 11 months ago

Can we make it conversation style where it remembers the context from previous prompts. That would be more helpful.