PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.72k stars 2.86k forks source link

[Question]: ernie1.0不报错但是3.0报index out of range in self #8571

Open mnbvcxzz1375 opened 3 weeks ago

mnbvcxzz1375 commented 3 weeks ago

请提出你的问题

打乱数据

data = data.sample(frac=1, random_state=42).reset_index(drop=True)

分割数据集为训练集和测试集

train_data, test_data = train_test_split(data, test_size=0.3, random_state=42)

加载预训练的 ERNIE 模型和分词器

tokenizer = AutoTokenizer.from_pretrained("nghuyong/ernie-3.0-mini-zh") ernie_model = AutoModel.from_pretrained("nghuyong/ernie-3.0-mini-zh")

假设 train_data 和 test_data 是已经加载的数据框,包含列 '0' 和 'label'

train_texts = train_data['0'].tolist() train_labels = train_data['label'].tolist()

test_texts = test_data['0'].tolist() test_labels = test_data['label'].tolist()

处理编码和截断

def preprocess_data(texts, tokenizer, max_length=128): encodings = tokenizer(texts, max_length=max_length, padding='max_length', truncation=True) return torch.tensor(encodings['input_ids'])

train_inputs = preprocess_data(train_texts, tokenizer) test_inputs = preprocess_data(test_texts, tokenizer)

train_labels = torch.tensor(train_labels) test_labels = torch.tensor(test_labels)


IndexError Traceback (most recent call last) Cell In[33], line 97 95 for inputs, labels in train_loader: 96 optimizer.zero_grad() ---> 97 outputs = model(inputs) 98 loss = criterion(outputs, labels.unsqueeze(1)) 99 loss.backward()

File ~.conda\envs\deeplearning\lib\site-packages\torch\nn\modules\module.py:1532, in Module._wrapped_call_impl(self, *args, kwargs) 1530 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1531 else: -> 1532 return self._call_impl(args, kwargs)

File ~.conda\envs\deeplearning\lib\site-packages\torch\nn\modules\module.py:1541, in Module._call_impl(self, *args, *kwargs) 1536 # If we don't have any hooks, we want to skip the rest of the logic in 1537 # this function, and just call forward. 1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1539 or _global_backward_pre_hooks or _global_backward_hooks 1540 or _global_forward_hooks or _global_forward_pre_hooks): -> 1541 return forward_call(args, **kwargs) 1543 try: 1544 result = None

Cell In[33], line 54, in TextClassifier.forward(self, x) 53 def forward(self, x): ---> 54 x = self.embedding(x) 55 x = x.permute(0, 2, 1) 56 x = self.conv1d_1(x)

File ~.conda\envs\deeplearning\lib\site-packages\torch\nn\modules\module.py:1532, in Module._wrapped_call_impl(self, *args, kwargs) 1530 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1531 else: -> 1532 return self._call_impl(args, kwargs)

File ~.conda\envs\deeplearning\lib\site-packages\torch\nn\modules\module.py:1541, in Module._call_impl(self, *args, *kwargs) 1536 # If we don't have any hooks, we want to skip the rest of the logic in 1537 # this function, and just call forward. 1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1539 or _global_backward_pre_hooks or _global_backward_hooks 1540 or _global_forward_hooks or _global_forward_pre_hooks): -> 1541 return forward_call(args, **kwargs) 1543 try: 1544 result = None

File ~.conda\envs\deeplearning\lib\site-packages\torch\nn\modules\sparse.py:163, in Embedding.forward(self, input) 162 def forward(self, input: Tensor) -> Tensor: --> 163 return F.embedding( 164 input, self.weight, self.padding_idx, self.max_norm, 165 self.norm_type, self.scale_grad_by_freq, self.sparse)

File ~.conda\envs\deeplearning\lib\site-packages\torch\nn\functional.py:2264, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse) 2258 # Note [embedding_renorm set_grad_enabled] 2259 # XXX: equivalent to 2260 # with torch.no_grad(): 2261 # torch.embeddingrenorm 2262 # remove once script supports set_grad_enabled 2263 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type) -> 2264 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

IndexError: index out of range in self