ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
306 stars 97 forks source link

AttributeError: module 'torch.nn.functional' has no attribute 'one_hot' #8

Open wongdarrell opened 5 years ago

wongdarrell commented 5 years ago

Hi, I downloaded and ran your program, and got a training error as above. I have no GPU, so I changed the setup to fp16 = 'false' (xlnet left as your demo choice).

What's the problem?

DarrellWong code: if args['do_train']: train_dataset = load_and_cache_examples(task, tokenizer) global_step, tr_loss = train(train_dataset, model, tokenizer) logger.info(" global_step = %s, average loss = %s", global_step, tr_loss) ------------------------------------------------------------ output window---- INFO:main:Creating features from dataset file at data/ 100%|████████████████████████████████| 560000/560000 [05:07<00:00, 1823.33it/s] INFO:main:Saving features into cached file data/cached_train_xlnet-base-cased_128_binary INFO:main: Running training INFO:main: Num examples = 560000 INFO:main: Num Epochs = 1 INFO:main: Total train batch size = 8 INFO:main: Gradient Accumulation steps = 1 INFO:main: Total optimization steps = 70000 Epoch: 0%| | 0/1 [00:00<?, ?it/s]

HBox(children=(IntProgress(value=0, description='Iteration', max=70000, style=ProgressStyle(description_width=… -----------------and then error messages --------------------

AttributeError Traceback (most recent call last)

in 1 if args['do_train']: 2 train_dataset = load_and_cache_examples(task, tokenizer) ----> 3 global_step, tr_loss = train(train_dataset, model, tokenizer) 4 logger.info(" global_step = %s, average loss = %s", global_step, tr_loss) in train(train_dataset, model, tokenizer) 43 'token_type_ids': batch[2] if args['model_type'] in ['bert', 'xlnet'] else None, # XLM don't use segment_ids 44 'labels': batch[3]} ---> 45 outputs = model(**inputs) 46 loss = outputs[0] # model outputs are always tuple in pytorch-transformers (see doc) 47 print("\r%f" % loss, end='') ~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result) ~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\pytorch_transformers\modeling_xlnet.py in forward(self, input_ids, token_type_ids, input_mask, attention_mask, mems, perm_mask, target_mapping, labels, head_mask) 1120 input_mask=input_mask, attention_mask=attention_mask, 1121 mems=mems, perm_mask=perm_mask, target_mapping=target_mapping, -> 1122 head_mask=head_mask) 1123 output = transformer_outputs[0] 1124 ~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs) 487 result = self._slow_forward(*input, **kwargs) 488 else: --> 489 result = self.forward(*input, **kwargs) 490 for hook in self._forward_hooks.values(): 491 hook_result = hook(self, input, result) ~\AppData\Local\Continuum\anaconda3\envs\transformers\lib\site-packages\pytorch_transformers\modeling_xlnet.py in forward(self, input_ids, token_type_ids, input_mask, attention_mask, mems, perm_mask, target_mapping, head_mask) 920 # `1` indicates not in the same segment [qlen x klen x bsz] 921 seg_mat = (token_type_ids[:, None] != cat_ids[None, :]).long() --> 922 seg_mat = F.one_hot(seg_mat, num_classes=2).to(dtype_float) 923 else: 924 seg_mat = None AttributeError: module 'torch.nn.functional' has no attribute 'one_hot'
ThilinaRajapakse commented 5 years ago

It looks like your Pytorch is out of date. Can you update it and try again?

wongdarrell commented 5 years ago

Hi Thilina, Your suggestion worked. However, its now been about 32h of processing, and it is up to : INFO:main:Saving model checkpoint to outputs/checkpoint-8000 with no sign of stopping. What is the last checkpoint count in your default run (1 epoch). Also, what's the setting if I want to freeze all weights except the last layer? Thanks

ThilinaRajapakse commented 5 years ago

Unfortunately, with no GPU your training speed will be slow. I can't remember the total number of steps, but it should be there in the output right before training starts. Checkpoint-8000 means that 8000 steps have been completed. There should also be a tqdm progress bar with approximate time remaining to completion.

I can't remember that off the top of my head but I can get back to you in a few hours on freezing layers. But, usually fine-tuning transformer models is done without freezing any of the layers.

I think it would be best if you used Google Colab with GPU rather than running it locally if a GPU is not available.

wongdarrell commented 5 years ago

If you are correct, it has only achieved 8000/70000 that you had embedded as t_total = 70000. Regrading fine-tuning, I thought that several model examples using transformer learning freeze all weight layers but the last layer, which is usually 'new-problem' specific. It seems that I will need to switch to colab then.

ThilinaRajapakse commented 5 years ago

For most transfer learning tasks, you would usually freeze the earlier layers. But in the case of BERT and other derivatives, the approach is to fine-tune all parameters, albeit for only a few epochs. This was the same approach used in the BERT paper.

For each task, we simply plug in the task specific inputs and outputs into BERT and finetune all the parameters end-to-end.