If I train in full-dataset,but batchsize<=32,how I set the hyperparameters?

hekaijie123 commented 2 years ago

hellow，author. I want to train this model in full-data，but I just have a rtx3090.It just has 24GB memory.How to set the hyperparameters to get the best result? Thank you.

fzh0917 commented 2 years ago

I suggest you set the size to 26 if you just have one RTX 3090 GPU with 24GB memory.

hekaijie123 commented 2 years ago

I suggest you set the size to 26 if you just have one RTX 3090 GPU with 24GB memory. Thank you. When I set "amp" is True,one RTX3090 can maximum use 44 batchsize.I have to compromise like this. How much "amp" will affect the outcome?

fzh0917 commented 2 years ago

I suggest you set the size to 26 if you just have one RTX 3090 GPU with 24GB memory. Thank you. When I set "amp" is True,one RTX3090 can maximum use 44 batchsize.I have to compromise like this. How much "amp" will affect the outcome?

amp means "Automatic Mixed Precision" in which fp32 and fp16 are used at the same time. So, the performance of your model will most likely be affected. See https://pytorch.org/docs/stable/amp.html for more details.

hekaijie123 commented 2 years ago

Thanks.It's hard being poor.I'll try to find another way.

INTOUCHABLE-VS commented 2 years ago

I suggest you set the size to 26 if you just have one RTX 3090 GPU with 24GB memory. Thank you. When I set "amp" is True,one RTX3090 can maximum use 44 batchsize.I have to compromise like this. How much "amp" will affect the outcome?

amp means "Automatic Mixed Precision" in which fp32 and fp16 are used at the same time. So, the performance of your model will most likely be affected. See https://pytorch.org/docs/stable/amp.html for more details.

请问您在训练时，是否使用了amp呢？我在复现时，使用full_data 和 amp ，loss 会出现 nan，请问这是正常的吗？

fzh0917 commented 2 years ago

我在训练的时候没有使用amp。使用amp掉精度是符合预期的事情，出现nan不正常。通常来说，softmax操作（指数函数结果向上溢出）和计算loss（单个元素产生的loss过大，导致总体loss的向上溢出）的时候比较容易导致nan，你可以优先检查一下这些地方。

发件人: Oliver Sick @.> 日期: 星期四, 2022年5月19日 14:06 收件人: fzh0917/STMTrack @.> 抄送: Zhihong Fu @.>, Comment @.> 主题: Re: [fzh0917/STMTrack] If I train in full-dataset,but batchsize<=32,how I set the hyperparameters? (Issue #18)

I suggest you set the size to 26 if you just have one RTX 3090 GPU with 24GB memory. Thank you. When I set "amp" is True,one RTX3090 can maximum use 44 batchsize.I have to compromise like this. How much "amp" will affect the outcome?

amp means "Automatic Mixed Precision" in which fp32 and fp16 are used at the same time. So, the performance of your model will most likely be affected. See https://pytorch.org/docs/stable/amp.html for more details.

请问您在训练时，是否使用了amp呢？我在复现时，使用full_data 和 amp ，loss 会出现 nan，请问这是正常的吗？

― Reply to this email directly, view it on GitHubhttps://github.com/fzh0917/STMTrack/issues/18#issuecomment-1131254110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFP5FU52YS44QFCI57TNMKDVKXK7TANCNFSM5J5C7SRQ. You are receiving this because you commented.Message ID: @.***>

INTOUCHABLE-VS commented 2 years ago

我在训练的时候没有使用amp。使用amp掉精度是符合预期的事情，出现nan不正常。通常来说，softmax操作（指数函数结果向上溢出）和计算loss（单个元素产生的loss过大，导致总体loss的向上溢出）的时候比较容易导致nan，你可以优先检查一下这些地方。发件人: Oliver Sick @.> 日期: 星期四, 2022年5月19日 14:06 收件人: fzh0917/STMTrack @.> 抄送: Zhihong Fu @.>, Comment @.> 主题: Re: [fzh0917/STMTrack] If I train in full-dataset,but batchsize<=32,how I set the hyperparameters? (Issue #18) I suggest you set the size to 26 if you just have one RTX 3090 GPU with 24GB memory. Thank you. When I set "amp" is True,one RTX3090 can maximum use 44 batchsize.I have to compromise like this. How much "amp" will affect the outcome? amp means "Automatic Mixed Precision" in which fp32 and fp16 are used at the same time. So, the performance of your model will most likely be affected. See https://pytorch.org/docs/stable/amp.html for more details. 请问您在训练时，是否使用了amp呢？我在复现时，使用full_data 和 amp ，loss 会出现 nan，请问这是正常的吗？ ― Reply to this email directly, view it on GitHub<#18 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFP5FU52YS44QFCI57TNMKDVKXK7TANCNFSM5J5C7SRQ. You are receiving this because you commented.Message ID: @.***>

好的，我知道了，十分感谢您的回复！

fzh0917 / STMTrack

If I train in full-dataset,but batchsize<=32,how I set the hyperparameters? #18