Closed NiklasHoltmeyer closed 3 years ago
I think the full dataset of 1.6 million tweets is stretching the available RAM on Colab to the limit. However, it looks like the preprocessing completed before it crashed and it then crashed when trying to cache the features. If that's the case, you might be able to use the full dataset by setting "no_cache": True
in model_args.
The time taken for training depends a lot on the hardware and 4 hours per epoch seems reasonable for such a large dataset on a Colab GPU. Transformer models are generally much larger and more resource-intensive compared to the typical LSTM or CNN model, so it makes sense for the training to take longer. However, you will rarely need to do more than 1 or 2 epochs with a Transformer model. You'll probably be fine with just 1 epoch considering the dataset is quite large.
If you want to speed things up, consider using distilroberta-base
instead of roberta-base
. It's a smaller model with close to the same performance. Also, you should make your train_batch_size
as large as your GPU can handle, i.e. the largest value which doesn't throw a CUDA memory error. Generally, this will be around 4-16.
0) I still got an out of Memory Error while using "no_cache": True
and the full dataset. With this Setting & train_batch_size = 40
, 800k Trainingsdata one Epoch takes 2 Hours now - which is a great improvement :) ill just run 2 epochs and should be fine maybe?
1) train_batch_size
I wanted to start low and tried train_batch_size': 125,
and i already run out of memory :D
RuntimeError: CUDA out of memory. Tried to allocate 106.00 MiB (GPU 0; 14.73 GiB total capacity; 13.45 GiB already allocated; 69.88 MiB free; 13.73 GiB reserved in total by PyTorch)
I tried it with a "clean" Runtime.
'train_batch_size': 50
gave me the same error, 'train_batch_size': 40,
worked.
2) As I said before I am new to this Deep Learning area, is one epoch enough in this case because this is a type of transfer learning?
3)
` model_type, model_name = "distilbert", "distilroberta-base"
gave me
Can't set hidden_size with value 768 for DistilBertConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"model_type": "distilbert",
"pad_token_id": 1
}
In that case, you can also try using lazy loading. This will preprocess the data on the fly so it doesn't need to keep the full dataset in memory.
A batch size of about 40 seems reasonable.
Yes, it's because fine-tuning a model is transfer learning.
distilroberta-base
is a roberta
model so you need to do:
model_type, model_name = "roberta", "distilroberta-base"
how could i use lazy loading with a dataframe? should i save my cleaned dataframe as csv (without header, and first text then label) and just change seperator/delmiter to ? so that i have text a text a text a; 1 blalsdlasdlasld asdlasldalsd asldasld; 2
btw ty for your great work!
Yes, you need to save your dataframe as a csv (technically a tsv) file. The delimiter should be a tab space.
E.g:
df.to_csv("my_data_file.tsv", sep="\t")
thanks ill try that :)
model_type, model_name = "roberta", "distilroberta-base"
didnt work for me, but
model_type, model_name = "distilbert", "distilbert-base-uncased"
did :)
its training, i hope the training will finish, before google colab kicks me off :D
"bertweet", "vinai/bertweet-covid19-base-uncased" -> takes approx. 8 1/2 hours "bert", "bert-base-uncased" -> approx 4 1/2 hrs "distilbert", "distilbert-base-uncased" -> approx 4 hours
the second and last should work, but the first might be problematic
ty for all your work :) and great help :)
Disclaimer: I don't know where to post this question, if this is not the right place for SimpleTransformer beginner questions, I would appreciate a reference to the right place.
Hi guys, I am new to Deep Learning and wanted to train a binary (sentiment) classification using SimpleTransformers. As a dataset I took Sentiment140 (1,6 Tweets 800k Positive, 800k Negative). The training itself works, but depending on the length of the dataset Google Colab crashes. If I divide the 1.6 million tweets into 1.28 million training and 0.32 million test data the model crashes after ->
(1) Is this normal? Now if I reduce the number to 800k training, 160k test data Google Colab does not crash, but one epoch takes 4 hours. (This number often works, sometimes 800k training-data also crashes as described above. When it gets to training, I don't even know if it goes through - since an epoch lasts 4 hours, I've never run it through) I do not know how far you can compare the things, but in tensorflow i have trained a CNN, BiLSTM network on the entire data set and there an epoch took only 5 minutes, (2) does 4 hours make sense, or have I made a gross error?
I also tried to add
'eval_accumulation_steps' : 20
to mymodel_args
, but it still crashed pre-trainingty in advanced