Thinklab-SJTU / pygmtools

A Python Graph Matching Toolkit.
https://pygmtools.readthedocs.io/
Other
275 stars 19 forks source link

Running GAMF and error occur:DefaultCPUAllocator: can't allocate memory #67

Closed YCaigogogo closed 10 months ago

YCaigogogo commented 11 months ago

开发者,您好,很感谢你们的优秀工作。最近我在跑你们的GAMF的代码来对两个vgg 11模型进行model fusion,但似乎出现了CPU爆内存的问题,报错信息如下: Relu Inplace is False Loaded parameters (file 0): [features.8.weight, features.16.weight, features.3.weight, features.0.weight, features.6.weight, features.18.weight, classifier.weight, features.11.weight, features.13.weight] Traceback (most recent call last): File "/data/yic/LAMDA-ZhiJian/main.py", line 15, in trainer = prepare_trainer(args) File "/data/yic/LAMDA-ZhiJian/zhijian/trainers/base.py", line 36, in prepare_trainer return get_class_from_module(f'zhijian.trainers.{args.training_mode}', 'Trainer')(args, *kwargs) File "/data/yic/LAMDA-ZhiJian/zhijian/trainers/model_merging.py", line 134, in init self.model = core_fn(self.model, merging_models_list) File "/data/yic/LAMDA-ZhiJian/zhijian/models/model_merging/method/gamf.py", line 27, in core K, params = self.graph_matching_fusion(merge_models_list) File "/data/yic/LAMDA-ZhiJian/zhijian/models/model_merging/method/gamf.py", line 53, in graph_matching_fusion affinity = torch.zeros([n1 n2, n1 * n2]).cuda() RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 233797861202500 bytes. Error code 12 (Cannot allocate memory)

我打印出了n1和n2的值,它们为2765,看起来是affinity矩阵太大了(维度约为 9e6 × 9e6),我想请问这种情况是正常的吗,该如何解决呢?

only-changer commented 11 months ago

你好,我没有在我的代码里找到你报错的语句,看起来是你对我们的代码进行一些修改。这里我们设计的n1和n2应该是神经网络每一层的channel数,一般是512/1024这种,确实可能会有些大,但是你这里的2765有些奇怪,可能是你在修改代码的时候使用了我们的全局匹配方法?就是我们是有一个方法会直接把神经网络中所有的channel(而不是每一层的channel)建模成一张大图进行匹配的,然后由于这样做图可能会太大,我们才随后开发了分层匹配的版本。麻烦你再检查下你修改后的代码~

YCaigogogo commented 11 months ago

你好,我是尝试跑通你们tutorial的这个示例 https://pygmtools.readthedocs.io/en/latest/auto_examples/pytorch/plot_model_fusion_pytorch.html ,似乎这个例子使用的是全局匹配方法,如果我想使用分层匹配,我应该如何修改呢

rogerwwww commented 11 months ago

如果我想使用分层匹配,我应该如何修改呢

可以参考这个仓库的实现 https://github.com/Thinklab-SJTU/GAMF