Open AetherPrior opened 4 days ago
I figured this out by changing the data collation pipeline to have the same input and output:
def chameleon_collate_fn(batch):
# Extract the images and questions
images = [ex['image'] for ex in batch]
labels = ["<image>"+ex['question'] + " " + ex['answer'] for ex in batch]
# Process the batch using the processor
batch_inputs = processor(images=images, text=labels, return_tensors="pt", padding=True)
labels = processor(images=images, text=labels, return_tensors="pt", padding=True).input_ids.clone() # feels like labels should be the inputs + answer themselves?
# mask out pad tokens
labels = labels.masked_fill(labels == processor.tokenizer.pad_token_id, -100)
# mask the input from the labels
labels[:, :len(batch_inputs["input_ids"])] = -100
batch_inputs["labels"] = labels
# Move inputs and labels to the appropriate device
batch_inputs = {key: val.to('cuda') for key, val in batch_inputs.items()}
return batch_inputs
I'm still not too clear about whether this is correct so any comments on this will be highly appreciated!
Thank you so much.
Hi, I've a question on data collation for finetuning. I have some input questions and some targets, and wish to know if I need to include the inputs as part of my labels during causal finetuning. Specifically, I've defined my collation function as follows:
However, when I pass these to the model call in the training loop:
I get a ValueError:
My batch size is 2, and my input shape is [2,1035] with my targets [2,1036] (one extra generation token for a numerical answer), so I'm not sure what's the issue here. Could someone help? Thanks!