Hello, the IMDB text classifier example in chapter 1 with only 1 epoch and a batch size of 16 (due to memory issues at 32) takes over 12 minutes to run. Is this expected?
Here is what I'm running
from fastai.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=16) # switched from 32 -> 16
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(1, 1e-2) # switched from 4 to 1 to complete faster.
Here is info about my setup.
Package info:
fastai 2.7.11
cuda 11.6.1
pytorch 11.14.1
pytorch-cuda 11.6
Gpu: 2080ti
Gpu Memory usage while running: 8055MiB (showing that this should be using the GPU and not cpu).
CPU: AMD Ryzen 9 3950X
Time it took to run 1 epoch: ~760 seconds.
Is this expected, or should I be able to run this faster. I ran this both in and out of jupyter to see if that was causing issues.
The earlier examples in Chapter 1 ran much faster.
It's the slowest by far on my M1 Mac's GPU also, I believe because it's training the network from scratch rather that using a fine-tuning process to accomplish transfer learning on an extant model.
Hello, the IMDB text classifier example in chapter 1 with only 1 epoch and a batch size of 16 (due to memory issues at 32) takes over 12 minutes to run. Is this expected?
Here is what I'm running
from fastai.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=16) # switched from 32 -> 16 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy) learn.fine_tune(1, 1e-2) # switched from 4 to 1 to complete faster.
Here is info about my setup.
Package info: fastai 2.7.11 cuda 11.6.1 pytorch 11.14.1 pytorch-cuda 11.6
Gpu: 2080ti Gpu Memory usage while running: 8055MiB (showing that this should be using the GPU and not cpu).
CPU: AMD Ryzen 9 3950X
Time it took to run 1 epoch: ~760 seconds.
Is this expected, or should I be able to run this faster. I ran this both in and out of jupyter to see if that was causing issues. The earlier examples in Chapter 1 ran much faster.
Any debugging help is much appreciated.