Open Lookey-Luo opened 1 year ago
原本Meta给的SAM中的vision transformer有三种模式:vit_b, vit_l, vit_h。我用的vit_h,batch_size已经设为1了。如果超出内存的话,可以修改train.py中的第70行代码来试试加载其他两种更小尺寸的vit,但是这样的话,需要找到并下载对应的SAM预训练权重,输入图像预处理尺寸可能也得做对应修改。
收到,谢谢您的指导!
------------------ 原始邮件 ------------------ 发件人: "ShellRedia/SAM-OCTA" @.>; 发送时间: 2023年11月15日(星期三) 晚上10:28 @.>; @.**@.>; 主题: Re: [ShellRedia/SAM-OCTA] How to reduce the memory footprint? (Issue #2)
The vision transformer has three modes: vit_b, vit_l, vit_h. I am using vit_l, and the batch size has been set to 1. If there is memory issue, you can modify the code at line 70 in "train.py" to try the other two modes, but you will need to find the corresponding pre-trained weights, and the input image preprocessing size may also need to be adjusted.
原本Meta给的SAM中的vision transformer有三种模式:vit_b, vit_l, vit_h。我用的vit_l,batch_size已经设为1了。如果超出内存的话,可以修改train.py中的第70行代码来试试加载其他两种更小尺寸的vit,但是这样的话,需要找到并下载对应的SAM预训练权重,输入图像预处理尺寸可能也得做对应修改。
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I tried using vit_b instead of vit_h, and I downloaded the pre-training model weight,imagenet21k+imagenet2012_ViT-B_16-224.pth,via https://drive.google.com/drive/folders/1azgrD1P413pXLJME0PjRRU-Ez-4GWN-S .Now I need to modify the image pre-processing size.I wonder which parts of the code I should adjust to modify the pre-processing size?Is the original vit_h pre-processing size 1024 × 1024? Should I change it to 224 × 224? Thank you! 我尝试使用vit_b代替vit_h,我通过 https://drive.google.com/drive/folders/1azgrD1P413pXLJME0PjRRU-Ez-4GWN-S 下载了预训练模型权重imagenet21k+imagenet2012_ViT-B_16-224.pth,现在我需要修改输入图像的预处理尺寸,我很疑惑我应该调整哪些部分的代码来修改预处理尺寸呢?原本vit_h的预处理尺寸是1024×1024吗?是不是要改成224×224呢?谢谢!
Yes, modifying the input size should be necessary. For vit_b, the input should be converted to 224*224. This part is implemented in the code provided by SAM, although I believe it can be modified. Still, I haven't tried it yet. In the dataset.py file, on line 11, a ResizeLongestSide class object named 'transform' is defined to handle this size conversion logic. I think it can be modified by changing the corresponding value in the arguments, for example, changing the passed parameter 1024 to 224. If it still doesn't work, I will take a closer look and provide a solution when I have time to organize the code.
File dataset.py: ... def get_sam_item(image, label_path, num_of_prompt_pos, num_of_prompt_total, local_mode, random_seed): transform = ResizeLongestSide(1024) # modify to 224 original_size = tuple(image.shape[-2:]) ...
是的,修改输入尺寸应该是必要的。对于vit_b,应该是要将输入转换为224*224,这一部分是SAM提供的代码实现的,不过我想这是能够修改的,尽管我还未尝试过。在dataset.py文件的第11行,定义了一个名为transform的ResizeLongestSide类对象来处理这一尺寸转换逻辑,我认为可以修改对应数值达到效果,例如将传参的1024改为224。如果仍然无法奏效,我之后有时间整理代码的时候会详细查看并给出解决方案。
I changed the 11th line of the dataset.py file to transform = ResizeLongestSide (224) as you said, but the size of the image is still not up to par. Maybe there are other changes that need to be made, some of the error prompts are as follows: size mismatch for image_encoder.pos_embed: copying a param with shape torch.Size([1, 64, 64, 1280]) from checkpoint, the shape in current model is torch.Size([1, 64, 64, 768]). size mismatch for image_encoder.patch_embed.proj.weight: copying a param with shape torch.Size([1280, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 16, 16]). Thank you for your suggestions, I hope you can give further solutions when you have spare time, thank you! 如您所说,我将dataset.py文件的第11行transform = ResizeLongestSide(1024)改为了transform = ResizeLongestSide(224),但是图像的尺寸仍然不符合要求,也许还有其它地方需要做改动,一部分报错提示如下: size mismatch for image_encoder.pos_embed: copying a param with shape torch.Size([1, 64, 64, 1280]) from checkpoint, the shape in current model is torch.Size([1, 64, 64, 768]). size mismatch for image_encoder.patch_embed.proj.weight: copying a param with shape torch.Size([1280, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 16, 16]). 感谢您的建议,希望您有空的时候能给出进一步的解决方案,谢谢!
了解情况了,我得过段时间才看看怎么解决。
Hello, and Happy New Year! I have updated the code, and now the vit_b model is also available for use. The Chinese README file has been updated with detailed explanations. I hope this is helpful for you.
您好,新年快乐!我已经更新了代码,现在vit_b的模型也能够使用了,README文件也更新了中文版本并附上了详细说明。希望对您有所帮助。
The memory footprint of running code is too high. Can I reduce the batchsize size to reduce the memory footprint? If the memory of a graphics card on a server is 16GB, what is the appropriate batchsize? In which code shall I make adjustments?