Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
10.48k stars 1.04k forks source link

CodeGemma #1270

Open rasbt opened 6 months ago

rasbt commented 6 months ago

There's CodeGemma now, which may be a good alternative to CodeLlama, which doesn't work so well. (In case you ever feel bored @Andrei-Aksionov :D)

All the various versions as listed here: https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11

Andrei-Aksionov commented 6 months ago

Sure, I can do this. Hopefully it's not too complicated 🤞

Andrei-Aksionov commented 6 months ago

So, there are 3 models available:

Screenshot 2024-04-11 at 3 50 48 PM

As can be seen from the table, 2b and 7b models are mostly for code completion, they require a special prompt in format:

prompt = '''\
<|fim_prefix|>import datetime
def calculate_age(birth_year):
    """Calculates a person's age based on their birth year."""
    current_year = datetime.date.today().year
    <|fim_suffix|> <-- (Note): this is where a cursor should be in IDE
    return age<|fim_middle|>\
'''

which is somewhat tricky to implement with the current LitGPT code. And frankly speaking I don't see why do we need it, since the output is so-so quality. If I copy-paste the codeblock from model page:

from transformers import GemmaTokenizer, AutoModelForCausalLM

model_id = "google/codegemma-2b"
tokenizer = GemmaTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = '''\
<|fim_prefix|>import datetime
def calculate_age(birth_year):
    """Calculates a person's age based on their birth year."""
    current_year = datetime.date.today().year
    <|fim_suffix|>
    return age<|fim_middle|>\
'''

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
prompt_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0][prompt_len:]))

the output is:

age = current_year - birth_year<|file_separator|><eos>

or with another codeblock:

from transformers import GemmaTokenizer, AutoModelForCausalLM

tokenizer = GemmaTokenizer.from_pretrained("google/codegemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/codegemma-2b")

input_text = "Write me a Python function to calculate the nth fibonacci number."
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

the output is:

<bos>Write me a Python function to calculate the nth fibonacci number.

The Fibonacci numbers are the numbers in the following integer sequence.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ……..

In mathematical terms, the sequence Fn of Fibonacci numbers is defined by the recurrence relation

Fn = Fn-1 + Fn-2

with seed values

F0 = 0 and F1 = 1.

The first ten terms are

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ……..

<strong>Example</strong>

Input: n = 10
Output: 55

<strong>Input:</strong> n = 20
<strong>Output:</strong> 6765

<strong>Input:</strong> n = 30
<strong>Output:</strong> 832040

On the other hand, the 7b-it model is a much better proposition: it can chat and follow instructions.

⚡ codegemma ~/lit-gpt litgpt chat --checkpoint_dir checkpoints/$repo_id                                                       
Now chatting with CodeGemma-7b-it.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: Hello
>> Reply: Hello! 👋 It's nice to hear from you. What would you like to talk about today? 😊
Time for inference: 1.70 sec total, 12.92 tokens/sec, 22 tokens

>> Prompt: Tell me a joke
>> Reply: Why did the bicycle fall over?

Because it was two tired!
Time for inference: 0.77 sec total, 18.08 tokens/sec, 14 tokens

>> Prompt: Write a softmax function in Python
>> Reply: ```python
import numpy as np

def softmax(x):
  """Compute softmax function for a given input array.

  Args:
    x: A numpy array.

  Returns:
    A numpy array of the same shape as x, containing the softmax values.
  """

  # Calculate the exponential of each element in x
  exps = np.exp(x - np.max(x, axis=1, keepdims=True))

  # Sum the exponentials of all elements in the same row
  sum_exps = np.sum(exps, axis=1, keepdims=True)

  # Divide the exponentials by the sum of exponentials in the same row
  softmax_values = exps / sum_exps

  return softmax_values

Time for inference: 8.96 sec total, 19.30 tokens/sec, 173 tokens

My recommendation to stick to 7b-it model. What do you think @rasbt?