deepglint / unicom

Unicom & MLCD: Large-Scale Visual Representation Model
228 stars 17 forks source link

ERROR torchrun retrieval.py --batch_size 1 --dataset cub --epochs 1 --gradient_acc 1 --model_name ViT-B/32 #9

Closed fengminxuan closed 1 year ago

fengminxuan commented 1 year ago

..

anxiangsir commented 1 year ago

你好,你可以贴一下报错的 日志:

这里我猜测的主要原因是,这个ViT结构的最后是有BN的,BN是不支持batchsize为1时候的训练,至少batchsize为2,另外建议有BN的时候,batchsize设置>16

Hello, could you please share the error log?

The main reason for the error is that the ViT structure has a BN layer at the end, which does not support training with batch size 1. The minimum recommended batch size is 2. Also, it's advisable to set the batch size to >16 when using BN layers.

fengminxuan commented 1 year ago

我像这样把CUB_200_2011贴到了data下,因为--eval我得到了83.几的结果,所以dataset应该问题不大。

然后把unicom文件夹 挂载到Ubuntu环境下运行  torchrun retrieval.py --batch_size 2 --dataset cub --epochs 1  --model_name ViT-B/32 这次batch_size我调成了2,得到了以下错误

------------------ 原始邮件 ------------------ 发件人: "deepglint/unicom" @.>; 发送时间: 2023年6月8日(星期四) 晚上8:59 @.>; @.**@.>; 主题: Re: [deepglint/unicom] ERROR torchrun retrieval.py --batch_size 1 --dataset cub --epochs 1 --gradient_acc 1 --model_name ViT-B/32 (Issue #9)

你好,你可以贴一下报错的 日志:

这里我猜测的主要原因是,这个ViT结构的最后是有BN的,BN是不支持batchsize为1时候的训练,至少batchsize为2,另外建议有BN的时候,batchsize设置>16

Hello, could you please share the error log?

The main reason for the error is that the ViT structure has a BN layer at the end, which does not support training with batch size 1. The minimum recommended batch size is 2. Also, it's advisable to set the batch size to >16 when using BN layers.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

fengminxuan commented 1 year ago

邮箱回复,图片没了 0a904a39da59446a15a24f7deabd5614 dc5c5551992dcc97faf8a5f6eb273939

anxiangsir commented 1 year ago

建议使用linux服务器进行训练