HKUDS / GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"
https://arxiv.org/abs/2310.13023
Apache License 2.0
493 stars 36 forks source link

CUDA out of memory #29

Closed bamboo-boy closed 6 months ago

bamboo-boy commented 7 months ago

在运行graphgpt_stage1.sh 时,我出现了这个错误。在您的最新更新中,我看到您提到可以使用两张3090显卡进行复现。我的配置是RTX 4090 * 4卡,但仍然出现了cuda内存不够的情况,希望您能帮助我解答这个问题,谢谢。 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 23.65 GiB of which 54.06 MiB is free. Process 835399 has 23.59 GiB memory in use. Of the allocated memory 23.21 GiB is allocated by PyTorch, and 4.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

tjb-tech commented 7 months ago

您好,如果您有4张卡,您可以选择添加参数 --gpus 0,1,2,3,你有调用所有四张卡嘛,另外,模型context_length也会影响显存,您可以试下调小看下。

kunupup commented 6 months ago

您好,如果您有4张卡,您可以选择添加参数 --gpus 0,1,2,3,你有调用所有四张卡嘛,另外,模型context_length也会影响显存,您可以试下调小看下。

请问如何指定gpu,我看distributed的参数似乎没有

LouHerGetUp commented 6 months ago

您好,如果您有4张卡,您可以选择添加参数 --gpus 0,1,2,3,你有调用所有四张卡嘛,另外,模型context_length也会影响显存,您可以试下调小看下。

您好,请问--gpus 0,1,2,3应该加在哪里或者说哪个文件中呢?期待您的回复,谢谢!!!