Closed Aston-zeal closed 5 months ago
Would you like to post the command that you executed causing the issue?
@Aston-zeal I thought the same at first but after looking at the code and putting a counter on the training loop minibatches I saw that it is actually progressing. Since the tqdm is on the epochs it will take a long time to observe the progress bar moving. Probably you are having the same issue and by putting a counter on the training loop you can observe the progress.
Yes, it's just that the training is too slow, I put it on the CPU
Yeah, CPU would be too slow. You can think of it as a fine-tuning process for a BERT model.
You can also try to limit the data size by adding the --data_size
flag.
For example:
# data generation -> 1K data
python preprocess_dataset.py --task_type 0 --data_size 1
# predictor training (regression with MSE loss)
python latency_prediction.py --task_type 0 --data_size 1
# predictor training (regression with L1 loss)
python latency_prediction.py --task_type 0 --l1_loss --data_size 1
It seems the training is stuck