Closed danyaljj closed 4 years ago
I had a similar issue to this. If I run the code as instructed on a machine with GPUs, the code will compile with XLA, but then fails with CUDA out of memory errors (even on large GPUs with 48Gb of memory).
I also tried using the --gin_param="serialize_num_microbatches.tokens_per_microbatch_per_replica = 512
but still without luck.
Would it be possible to get the full environment (i.e., pip freeze
, possibly OS and cuda versions) where the GPU code was made to work?
Have you tried with model:1 batch:2?
On Fri, Jan 24, 2020 at 2:39 PM Nicholas A. Lourie notifications@github.com wrote:
I had a similar issue to this. If I run the code as instructed on a machine with GPUs, the code will compile with XLA, but then fails with CUDA out of memory errors (even on large GPUs with 48Gb of memory).
I also tried using the --gin_param="serialize_num_microbatches.tokens_per_microbatch_per_replica = 512 but still without luck.
Would it be possible to get the full environment (i.e., pip freeze, possibly OS and cuda versions) where the GPU code was made to work?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google-research/text-to-text-transfer-transformer/issues/57?email_source=notifications&email_token=AAIJV2CSNDHEHQOGSSANHWDQ7NUYPA5CNFSM4KLM5F72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4J34A#issuecomment-578330096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIJV2CXQRSPF63B5DCZJVDQ7NUYPANCNFSM4KLM5F7Q .
I also do have memory issues when trying to fine-tune the small model on GPUs. I feel like no matter how I configure data/model parallelism and batch size the memory allocated on the GPU always looks the same and I always run into out of memory issues.
Do you have any information about what kind of setup T5 was tested on for GPU support?
This should be fixed in #148. Please reopen if not.
I have used the following section of code to solve the the issue.
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 4GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Here is full log:
Here are some additional information about the environment:
FYI @nalourie-ai2