brevdev / notebooks

Collection of notebook guides created by the Brev.dev team!
MIT License
1.62k stars 275 forks source link

Russian characters in the results of inference before finetune #1

Closed joshhu closed 2 months ago

joshhu commented 11 months ago

Hi,

I tried to run the mistral fine tuning with qlora notebook, and the first inference of the eval_prompt showed russian characters as followed.

Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']

### Target sentence:
I remember you saying you found Little Big Adventure to be average. Are you not usually that into single-player games on PlayStation?

### Meaning representation:
д

### Meaning representation:
{
  "function": "inform",
  "attributes": {
    "name": "Little Big Adventure",
    "exp_release_date": "1994-01-01",
    "release_year": 1994,
    "developer": "Adeline Software International",
    "esrb": "E",
    "rating": 3,
    "genres": ["Action", "Adventure"],
    "player_perspective": "Third-person",
    "has_multiplayer": false,
    "platforms": ["PlayStation"],
    "available_on_steam": false,
    "has_linux_release": false,
    "has_mac_release": false,
    "specifier": "average"
  }
}

is there something wrong or is it a nomal prediction of the outcome?

Thanks.

felri commented 10 months ago

I'm working on something really similar to your use case, mine is to input a job description and output a json with keywords, location etc.

The base model was useless for me too, and I also got the Russian output nonsense, but after fine tunning llama2 13b with 2k examples generated by chatgpt-3.5-turbo I'm getting ok/good results, so don't take the first eval as a hard block, just move to the training with a good dataset, at least 1k examples.

The lora config was a tricky one to get right for this type of structured output as a JSON, but after 6 training sessions I got to the following:

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=64,
    lora_alpha=128,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

alpha = 2 * r seems to be a good number for my case

the training part:

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_val_dataset,
    args=transformers.TrainingArguments(
        output_dir=output_dir,
        per_device_train_batch_size=4, # I have 2k examples, 4 per batch, 500 steps = 2000 examples
        gradient_accumulation_steps=1,
        max_steps=500,
        learning_rate=3e-5,
        bf16=True,
        optim="paged_adamw_8bit",
        logging_steps=50,             
        logging_dir="./logs",       
        save_strategy="steps",   
        save_steps=100,
        evaluation_strategy=IntervalStrategy.STEPS, # Evaluate the model every logging step
        eval_steps=50,              
        do_eval=True,           
        lr_scheduler_type="linear", # to update my learning rate automatically 
        warmup_steps=20,
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
DietmarGrabowski commented 10 months ago

Hi,

I tried to run the mistral fine tuning with qlora notebook, and the first inference of the eval_prompt showed russian characters as followed.

Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']

### Target sentence:
I remember you saying you found Little Big Adventure to be average. Are you not usually that into single-player games on PlayStation?

### Meaning representation:
д

### Meaning representation:
{
  "function": "inform",
  "attributes": {
    "name": "Little Big Adventure",
    "exp_release_date": "1994-01-01",
    "release_year": 1994,
    "developer": "Adeline Software International",
    "esrb": "E",
    "rating": 3,
    "genres": ["Action", "Adventure"],
    "player_perspective": "Third-person",
    "has_multiplayer": false,
    "platforms": ["PlayStation"],
    "available_on_steam": false,
    "has_linux_release": false,
    "has_mac_release": false,
    "specifier": "average"
  }
}

is there something wrong or is it a nomal prediction of the outcome?

Thanks.

Hi, same here - i will try the original notebook with llama2, but with llama-7b

harper-carroll commented 9 months ago

how's it going? were you able to figure it out?