some questions about replaced weight matrix

zuininanHHH9527 commented 5 days ago

I use the Losparse's origin code to prune my own model, but it just replace the Linear module, don't work to qkv and other module I defined in allow_name list, so I use your code to try it. It replaced successfully, but my code encounter this error "TypeError: unsupported operand type(s) for : 'Parameter' and 'NoneType'", I debug it and found after"Replaced layer:output_dense wav2vec.wav2vec2.encoder.layers.11.feed_forward.output_dense replaced", this function "self.ipt[n] = (p p.grad).abs().detach()" will encounter some bugs, my k_proj'p.grad = None , I don't know how to solve it ,can you give me some help? Thanks!

bostrower3 commented 4 days ago

I would love to help! Send me a link to your code and I’ll take a look.

Get Outlook for iOShttps://aka.ms/o0ukef

From: zuininanHHH9527 @.> Sent: Friday, November 15, 2024 2:11:06 AM To: bostrower3/LoSparse @.> Cc: Subscribed @.***> Subject: [bostrower3/LoSparse] some questions about replaced weight matrix (Issue #1)

I use the Losparse's origin code to prune my own model, but it just replace the Linear module, don't work to qkv and other module I defined in allow_name list, so I use your code to try it. It replaced successfully, but my code encounter this error "TypeError: unsupported operand type(s) for : 'Parameter' and 'NoneType'", I debug it and found after"Replaced layer:output_dense wav2vec.wav2vec2.encoder.layers.11.feed_forward.output_dense replaced", this function "self.ipt[n] = (p p.grad).abs().detach()" will encounter some bugs, my k_proj'p.grad = None , I don't know how to solve it ,can you give me some help? Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/bostrower3/LoSparse/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A4JSG5GQUKEAZGVFGE3LUU32AWUBVAVCNFSM6AAAAABR2UJ4MWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY3DCMJXGY2TMOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

zuininanHHH9527 commented 2 days ago

`class mywav2vec2(nn.Module): def init(self): super().init() self.wav2vec = Wav2Vec2ForXVector.from_pretrained("wav2vec2-base-superb-sv", local_files_only=True) self.classifier = nn.Linear(512, 7) self.out = nn.Softmax(dim=1)

def forward(self,x):
    x = self.wav2vec(x)['logits']
    x = self.classifier(x)
    out = self.out(x)
    return  out, x`

This is my model, which wav2vec2 in it is a pretrain model from Huggingface. I use it to test Losparse--for audio classification.

zuininanHHH9527 commented 2 days ago

I use your code to implement Losparse, after the 12th transformer matrix replaced, there will be a bug like this Replaced layer:k_proj wav2vec.wav2vec2.encoder.layers.11.attention.k_proj replaced Replaced layer:v_proj wav2vec.wav2vec2.encoder.layers.11.attention.v_proj replaced Replaced layer:q_proj wav2vec.wav2vec2.encoder.layers.11.attention.q_proj replaced Replaced layer:out_proj wav2vec.wav2vec2.encoder.layers.11.attention.out_proj replaced Replaced layer:intermediate_dense wav2vec.wav2vec2.encoder.layers.11.feed_forward.intermediate_dense replaced Replaced layer:output_dense wav2vec.wav2vec2.encoder.layers.11.feed_forward.output_dense replaced 0%| | 0/338 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/xyt/HYH/past/Multi_feature_fusion/trainwavlosparse copy.py", line 543, in <module> train(epoch) File "/home/xyt/HYH/past/Multi_feature_fusion/trainwavlosparse copy.py", line 295, in train pruner.doitall(net,step) File "/home/xyt/HYH/past/Multi_feature_fusion/trainwavlosparse copy.py", line 242, in doitall self.iterative_prune(model) File "/home/xyt/HYH/past/Multi_feature_fusion/trainwavlosparse copy.py", line 224, in iterative_prune self.ipt[n] = (p * p.grad).abs().detach() #p.grad = None TypeError: unsupported operand type(s) for *: 'Parameter' and 'NoneType' I debug it and found p is None, I don't know why. Can u give me some advice,Thanks!

zuininanHHH9527 commented 2 days ago

I check it once more. I find that the replacement phase has run its course. The problem may be in the training phase. There is my code `def train(epoch):

start = time.time()
net.train()
global_count = 0
train_loss_writer = open(loss_path.format("train_loss"), 'a')
train_acc_writer = open(loss_path.format("train_acc"), 'a')
pred = []
label_all = []
train_loss = 0.0
# train_loss_kld_T = 0.0
train_loss_kld_F = 0.0
tqdm_gen = tqdm.tqdm(training_loader)

for batch_index, batch in enumerate(tqdm_gen, 1):
    global_count = global_count + 1
    ##########
    optimizer.zero_grad()

    data, _ = [_.cuda() for _ in batch]
    label = batch[1]
    label = label.type(torch.cuda.LongTensor)
    data = data.squeeze()
    data = torch.layer_norm(data, [data.size()[0], 44100])
    for w in net.parameters():
        L2 = torch.norm(w, p=2)*1e2

    if step >= pruner.warm_up_steps :#
            pruner.doitall(net,step)   
    outputs, _ = net(data)
    loss = loss_function_cls(outputs, label) + L2
    train_loss += loss.item()   

    loss.backward()
    optimizer.step()

    print(step)
    tqdm_gen.set_description('Training Epoch: {epoch} Loss_cls: {:0.4f}'.format(
            loss.item(),
            # loss_align.item(),
            epoch=epoch))

    if epoch <= args['WARMUP']:
        warmup_scheduler.step()
    outputs = outputs.detach().cpu().numpy()
    label = label.detach().cpu().numpy()
    logits = outputs.argmax(1)
    pred.extend(logits)
    label_all.extend(label)
finish = time.time()
pred = np.array(pred, dtype=int)
label_all = np.array(label_all, dtype=int)
acc = Accuracy(pred, label_all)
batch_index+=1
train_loss_writer.writelines(str(train_loss/batch_index) + '\n')
train_loss_writer.close()
train_acc_writer.writelines(str(acc)+'\n')
train_acc_writer.close()

for name, param in net.named_parameters():
    layer, attr = os.path.splitext(name)
    attr = attr[1:]

finish = time.time()

print('epoch {} training time consumed: {:.2f}s'.format(epoch, finish - start))`

zuininanHHH9527 commented 1 day ago

I try to change the order: let ``` if step >= pruner.warm_up_steps :# pruner.doitall(net,step)


after loss.backward() but before optimizer.step()
the 3th attention will error as before, and I delete the .detach(), the grad display is all normal. But model is too big to run

bostrower3 / LoSparse

some questions about replaced weight matrix #1