chenlong-clock / CFED-HANet

Code and dataset for lrec-coling 2024 paper: "Continual Few-shot Event Detection via Hierarchical Augmentation Networks"
5 stars 0 forks source link

UnboundLocalError: local variable 'token' referenced before assignment #5

Open nguyenhoanganh2002 opened 2 weeks ago

nguyenhoanganh2002 commented 2 weeks ago

Traceback (most recent call last): File "/mnt/anhnh/FSED/train.py", line 470, in train(0, args) File "/mnt/anhnh/FSED/train.py", line 153, in train exemplar_dataset = collect_exemplar_dataset(args.dataset, args.data_root, 'train', label2idx, stage-1, streams[stage-1]) File "/mnt/anhnh/FSED/utils/dataloader.py", line 131, in collect_exemplar_dataset if len(token) >= max_seqlen + 2:

 def collect_exemplar_dataset(dataset, root, split, label2idx, stage_id, labels):
    data = [[instance for instance in t] for t in collect_from_json(dataset, root, split)[stage_id]]
    data_tokens, data_labels, data_masks, data_spans = [], [], [], []
    for idx, task_data in enumerate(tqdm(data)):
        for dt in task_data:
            # pop useless properties
            if 'mention_id' in dt.keys():
                dt.pop('mention_id')
            if 'sentence_id' in dt.keys():    
                dt.pop('sentence_id')
            # if split == 'train':
            add_label = []
            add_span = []
            new_t = {}
            for i in range(len(dt['label'])):
                if dt['label'][i] == labels[idx]: 
                    add_label.append(dt['label'][i]) 
                    add_span.append(dt['span'][i])
            if len(add_label) != 0:
                token = dt['piece_ids']
                new_t['label'] = add_label
                valid_span = add_span
                valid_label = [label2idx[item] if item in label2idx else 0 for item in add_label]
            # else:
            #     token = dt['piece_ids']
            #     valid_span = dt['span'].copy()
            #     valid_label = [label2idx[item] if item in label2idx else 0 for item in dt['label']]
                # max_seqlen = 90
            max_seqlen = args.max_seqlen # 344, 249, 230, 186, 167
            if len(token) >= max_seqlen + 2:
                token_sep = token[-1]
                token = token[:max_seqlen + 1] + [token_sep]
                invalid_span = np.unique(np.nonzero(np.asarray(valid_span) > max_seqlen)[0])
                invalid_span = invalid_span[::-1]
                for invalid_idx in invalid_span:
                    valid_span.pop(invalid_idx)
                    valid_label.pop(invalid_idx)

            pad_length = (max_seqlen + 2 - len(token))
            if len(token) < max_seqlen + 2:
                token = [pad_id] * pad_length + token
            token_mask = [1 if tkn != pad_id else 0 for tkn in token]
            token_mask[-1] = 1
                # span_mask = []
                # for i in range(len(token)):
                #     span_mask.append([0, 0])
                # for item in valid_span:
                #     for i in range(len(item)):
                #         span_mask[item[i]][i] = 1
            data_tokens.append(token)
            data_labels.append(valid_label)
            data_masks.append(token_mask)
            valid_span = np.array(valid_span) + pad_length
            valid_span = valid_span.tolist()
            data_spans.append(valid_span)
    return MAVEN_Dataset(data_tokens, data_labels, data_masks, data_spans)
chenlong-clock commented 2 weeks ago

I'll check this and contact you as soon as possible

nguyenhoanganh2002 commented 2 weeks ago

I'll check this and contact you as soon as possible

It's worth noting that I encountered this problem when I changed the backbone to Llama 3 8b. The 1 and 2-shot scenarios worked well, but the 5 and 10-shot scenarios resulted in the aforementioned error at stage 3.

chenlong-clock commented 2 weeks ago

You may need to check if the label of each class is placed correctly. If not, the condition in: if len(add_label) != 0: is False. Thus, the token is not assigned, resulting in the Error.

nguyenhoanganh2002 commented 2 weeks ago

You may need to check if the label of each class is placed correctly. If not, the condition in: if len(add_label) != 0: is False. Thus, the token is not assigned, resulting in the Error.

I'm sorry for asking so many questions, could you provide the mapping between label_id and label_name?

chenlong-clock commented 2 weeks ago

could you provide the mapping between label_id and label_name?

The label_id and label_name are the same as defined in this repo. You may have misunderstood me.

the label of each class is placed correctly.

I mean the variable add_label may be empty because of some bugs, e.g., the exemplar is not correctly built. You may need to debug the variable in the third stage and compare them with the previous stage to see if some of the labels are missing.

nguyenhoanganh2002 commented 1 week ago

could you provide the mapping between label_id and label_name?

The label_id and label_name are the same as defined in this repo. You may have misunderstood me.

the label of each class is placed correctly.

I mean the variable add_label may be empty because of some bugs, e.g., the exemplar is not correctly built. You may need to debug the variable in the third stage and compare them with the previous stage to see if some of the labels are missing.

I ran your code to reproduce the results from the paper, but I encountered the same error in permutations 1, 2, 3, and 4.