google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.23k stars 571 forks source link

ValueError: Expected input batch_size (864) to match target batch_size (32). #232

Open abhibha1807 opened 3 years ago

abhibha1807 commented 3 years ago

Hello, I am a newbie to tensorflow and hugging face. I had some BERT for text classification code with me and I am trying to figure out whether the same code can be used for ALBERT. I am facing this error while training: ValueError: Expected input batch_size (864) to match target batch_size (32). My data is the in the form of a CSV file with two columns, 'text' and 'label'. The label column has two labels [0.0,1.0] The dataset contains 13752 instances and out of them approx 13569 are in training and 1883 are in validation. This is the code:

input_ids_train = []
attention_masks_train = []
for sent in sentences_train:
    encoded_dict_train = tokenizer.encode_plus(
                        sent,                  
                        add_special_tokens = True, 
                        max_length = 27,        
                        padding = 'longest',
                        return_attention_mask = True,  
                        return_tensors = 'pt', 
                        truncation=True   
                   )
   input_ids_train.append(encoded_dict_train['input_ids'])
   attention_masks_train.append(encoded_dict_train['attention_mask'])

input_ids_train = torch.cat(input_ids_train, dim=0)
attention_masks_train = torch.cat(attention_masks_train, dim=0)
labels_train = torch.tensor(labels_train)
tf.shape(input_ids_train) # OUTPUT: <tf.Tensor: shape=(2,), dtype=int32, numpy=array([13569,    27], dtype=int32)>
tf.shape(labels_train)# OUTPUT:  <tf.Tensor: shape=(1,), dtype=int32, numpy=array([13569], dtype=int32)>
train_dataset = TensorDataset(input_ids_train, attention_masks_train, labels_train)
train_dataloader = DataLoader(train_dataset, sampler = RandomSampler(train_dataset),  batch_size = 32)

#defining the model 
model = AutoModelWithLMHead.from_pretrained("albert-base-v1",num_labels = 2,output_attentions = False,output_hidden_states = False)
model.cuda()
epochs=2
for epoch_i in range(0, epochs):
    total_train_loss = 0
    model.train()
    for step, batch in enumerate(train_dataloader):
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        loss, logits = model(b_input_ids,  token_type_ids=None,  attention_mask=b_input_mask,labels=b_labels.long()) #error from this line
        total_train_loss += loss.item()
        loss.backward()

    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(train_dataloader)            

The code worked well when I was using BERT, but I got this error when I switched to ALBERT, hence I have posted the question here. Any links to answers at other forums would be also highly appreciated. Please let me know if am missing out anything. Thank you.