Open wongdarrell opened 5 years ago
It looks like your Pytorch is out of date. Can you update it and try again?
Hi Thilina, Your suggestion worked. However, its now been about 32h of processing, and it is up to : INFO:main:Saving model checkpoint to outputs/checkpoint-8000 with no sign of stopping. What is the last checkpoint count in your default run (1 epoch). Also, what's the setting if I want to freeze all weights except the last layer? Thanks
Unfortunately, with no GPU your training speed will be slow. I can't remember the total number of steps, but it should be there in the output right before training starts. Checkpoint-8000 means that 8000 steps have been completed. There should also be a tqdm progress bar with approximate time remaining to completion.
I can't remember that off the top of my head but I can get back to you in a few hours on freezing layers. But, usually fine-tuning transformer models is done without freezing any of the layers.
I think it would be best if you used Google Colab with GPU rather than running it locally if a GPU is not available.
If you are correct, it has only achieved 8000/70000 that you had embedded as t_total = 70000. Regrading fine-tuning, I thought that several model examples using transformer learning freeze all weight layers but the last layer, which is usually 'new-problem' specific. It seems that I will need to switch to colab then.
For most transfer learning tasks, you would usually freeze the earlier layers. But in the case of BERT and other derivatives, the approach is to fine-tune all parameters, albeit for only a few epochs. This was the same approach used in the BERT paper.
For each task, we simply plug in the task specific inputs and outputs into BERT and finetune all the parameters end-to-end.
Hi, I downloaded and ran your program, and got a training error as above. I have no GPU, so I changed the setup to fp16 = 'false' (xlnet left as your demo choice).
What's the problem?
DarrellWong code: if args['do_train']: train_dataset = load_and_cache_examples(task, tokenizer) global_step, tr_loss = train(train_dataset, model, tokenizer) logger.info(" global_step = %s, average loss = %s", global_step, tr_loss) ------------------------------------------------------------ output window---- INFO:main:Creating features from dataset file at data/ 100%|████████████████████████████████| 560000/560000 [05:07<00:00, 1823.33it/s] INFO:main:Saving features into cached file data/cached_train_xlnet-base-cased_128_binary INFO:main: Running training INFO:main: Num examples = 560000 INFO:main: Num Epochs = 1 INFO:main: Total train batch size = 8 INFO:main: Gradient Accumulation steps = 1 INFO:main: Total optimization steps = 70000 Epoch: 0%| | 0/1 [00:00<?, ?it/s]
HBox(children=(IntProgress(value=0, description='Iteration', max=70000, style=ProgressStyle(description_width=… -----------------and then error messages --------------------
AttributeError Traceback (most recent call last)