Closed Venkat-Rajgopal closed 1 year ago
maybe you need to pad id, mask and token_type_ids as below?
padding_length = self.max_len - len(ids)
ids = ids + ([0]*padding_length)
mask = mask + ([0]*padding_length)
token_type_ids = token_type_ids + ([0]*padding_length)
I noticed that this small portion is not in your code but is in Abhishek's
class BERTDataset:
def __init__(self, review, target):
self.review = review
self.target = target
self.tokenizer = tokenizer
self.max_len = max_len
def __len__(self):
return len(self.review)
def __getitem__(self, item):
review = str(self.review[item])
review = " ".join(review.split())
tokenized_inputs = self.tokenizer.encode_plus(
review,
None,
add_special_tokens=True,
max_length=self.max_len,
padding=True,
truncation=True
)
ids = tokenized_inputs["input_ids"]
mask = tokenized_inputs["attention_mask"]
token_type_ids = tokenized_inputs["token_type_ids"]
padding_length = self.max_len - len(ids)
ids = ids + ([0]*padding_length)
mask = mask + ([0]*padding_length)
token_type_ids = token_type_ids + ([0]*padding_length)
return {
"ids": torch.tensor(ids, dtype=torch.long),
"mask": torch.tensor(mask, dtype=torch.long),
"token_type_ids": torch.tensor(token_type_ids, dtype=torch.long),
"targets": torch.tensor(self.target[item], dtype=torch.float),
}
For some reason I am unable to iterate throught the Pytorch Dataloader. Could be something i am missing or the Dataloader has bug.
When iterating through the
dataloader
the following error comes up.Appreciate your inputs.