MASILab / 3DUX-Net

244 stars 34 forks source link

训练过程中到验证阶段出现显卡内存不足问题 #50

Closed guli-7721 closed 2 months ago

guli-7721 commented 1 year ago

我微调了一下代码,让模型用两个显卡跑,还是会出现显卡内存不足情况 。 把 max_iter 设为10万,eval_step设为1000,batch_size设为1,num_workers 设为0,用两张显卡跑也出现了显卡内存不足情况,但明明显示两个显卡内存分别占用7G左右,显卡是有充足的内存,不知什么原因了,请教您一下各位大佬解决问题的方法 ,谢谢!

image

10dutel commented 6 months ago

求问怎么设置多显卡运行,需要改哪里?

dream-mjq commented 5 months ago

I have the same problem, have you solved this problem?

guli-7721 commented 5 months ago

I solved it, if you put your own dataset and you need to change the dataset name, remember that it's in load_dataset_transforms.py, flare change it to your own dataset name. I hope you find the above answer helpful!

------------------ 原始邮件 ------------------ 发件人: "MASILab/3DUX-Net" @.>; 发送时间: 2024年4月9日(星期二) 晚上9:38 @.>; @.**@.>; 主题: Re: [MASILab/3DUX-Net] 训练过程中到验证阶段出现显卡内存不足问题 (Issue #50)

I have the same problem, have you solved this problem?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

BennettLandman commented 2 months ago

I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.

RY-97 commented 3 weeks ago

我微调了一下代码,让模型用两个显卡跑,还是会出现显卡内存不足情况 。 把 max_iter 设为10万,eval_step设为1000,batch_size设为1,num_workers 设为0,用两张显卡跑也出现了显卡内存不足情况,但明明显示两个显卡内存分别占用7G左右,显卡是有充足的内存,不知什么原因了,请教您一下各位大佬解决问题的方法 ,谢谢!

image

请问您有完整的修改代码吗?这个原始代码错误的地方太多了?

RY-97 commented 3 weeks ago

@guli-7721