RanchiZhao / bmtrain_qlora

0 stars 1 forks source link

能否提供一个完整可用的工程 #1

Open jinmin527 opened 1 year ago

jinmin527 commented 1 year ago

我git clone bmtrain 0.2.2版本的代码,用bmtrain_qlora目录替换BMTrain中的bmtrain文件夹,然后进行python setup.py develop操作。测试的时候结果报错

import bmtrain Traceback (most recent call last): File "", line 1, in File "/work/Qlora/BMTrain/bmtrain/init.py", line 2, in from .init import init_distributed File "/work/Qlora/BMTrain/bmtrain/init.py", line 8, in from . import nccl File "/work/Qlora/BMTrain/bmtrain/nccl/init.py", line 4, in from .. import C ImportError: cannot import name 'C' from partially initialized module 'bmtrain' (most likely due to a circular import) (/work/Qlora/BMTrain/bmtrain/init.py)

不确定是不是我将bmtrain目录覆盖导致的问题,能否提供一个类似于BMTrain完整的工程

RanchiZhao commented 1 year ago

https://github.com/OpenBMB/CPM-Bee/pull/100 这是qlora的pr,还没有合,然后bmtrain不知道修复没有int8相关问题,未修复的时候需要本地魔改下BMTrain.blocklayer下面这处就可以了: (blocklayer中不能很好地传requires_grad,需要手动判断dtype类型并设置requires_grad = False)

if dtype == torch.uint8:
    storage_param = torch.nn.Parameter(
    torch.tensor([], dtype=dtype, device=device).set_(storage_param_buffer),
    requires_grad = False,
)
else:
    storage_param = torch.nn.Parameter(
    torch.tensor([], dtype=dtype, device=device).set_(storage_param_buffer),
)