Fix/dataloader error - Githubissues

fix dataloader error:

File "/mnt/Megatron-DeepSpeed/pretrain_gpt.py", line 130, in get_batch_pipe
attention_mask, loss_mask, position_ids = get_ltor_masks_and_position_ids(
File "/mnt/Megatron-DeepSpeed/megatron/utils.py", line 180, in get_ltor_masks_and_position_ids
attention_mask = torch.tril(torch.ones(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 37631.79 GiB (GPU 3; 79.21 GiB total capacity; 50.52 GiB already allocated; 24.67 GiB free; 52.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
pt-0de9jsbu-master-0:39:39 [5] NCCL INFO comm 0x559e7ce3f680 rank 0 nranks 2 cudaDev 5 busId ad000 - Abort COMPLETE

bigscience-workshop / Megatron-DeepSpeed

Fix/dataloader error #384