DFKI-NLP / InterroLang

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations [EMNLP 2023 Findings]
https://arxiv.org/abs/2310.05592
5 stars 1 forks source link

[Operation] nlpattribute throws an error for OLID #114

Closed tanikina closed 1 year ago

tanikina commented 1 year ago

Config: olid.gin Input: sentence level feature importance for id 1766 Parsed: filter id 1766 and nlpattribute sentence [e] Traceback:

[2023-06-18 10:43:13,641] INFO in flask_app: Traceback getting bot response: Traceback (most recent call last): File "/home/ubuntu/projects/InterroLang/flask_app.py", line 192, in get_bot_response response = BOT.update_state(user_text, conversation) File "/home/ubuntu/projects/InterroLang/logic/core.py", line 898, in update_state returned_item = run_action( File "/home/ubuntu/projects/InterroLang/logic/action.py", line 51, in run_action action_return, action_status = actions[p_text]( File "/home/ubuntu/projects/InterroLang/actions/explanation/feature_importance.py", line 374, in feature_importance_operation return_s += get_sentence_level_feature_importance(conversation, filtered_text, simulation) File "/home/ubuntu/projects/InterroLang/actions/explanation/feature_importance.py", line 195, in get_sentence_level_feature_importance res_list = get_explanation(dataset_name, inputs, conversation, file_name="sentence_level") File "/home/ubuntu/projects/InterroLang/actions/explanation/feature_importance.py", line 85, in get_explanation res_list = generate_explanation(model, dataset_name, inputs, conversation, file_name=file_name) File "/home/ubuntu/projects/InterroLang/actions/custom_input.py", line 249, in generate_explanation attribution, predictions = compute_feature_attribution_scores(b, model, device) File "/home/ubuntu/projects/InterroLang/actions/custom_input.py", line 165, in compute_feature_attribution_scores attributions = explainer.attribute( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/captum/log/init.py", line 35, in wrapper return func(args, kwargs) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py", line 365, in attribute inputs_layer = _forward_layer_eval( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/captum/_utils/gradient.py", line 182, in _forward_layer_eval return _forward_layer_eval_with_neuron_grads( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/captum/_utils/gradient.py", line 445, in _forward_layer_eval_with_neuron_grads saved_layer = _forward_layer_distributed_eval( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/captum/_utils/gradient.py", line 294, in _forward_layer_distributed_eval output = _run_forward( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/captum/_utils/common.py", line 456, in _run_forward output = forward_func( File "/home/ubuntu/projects/InterroLang/actions/custom_input.py", line 117, in bert_forward output_model = model(input_model) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1599, in forward outputs = self.bert( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/transformers/adapters/context.py", line 108, in wrapper_func results = f(self, args, kwargs) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1042, in forward embedding_output = self.embeddings( File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl result = forward_call(*input, **kwargs) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 245, in forward inputs_embeds = self.word_embeddings(input_ids) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1123, in _call_impl hook_result = hook(self, input, result) File "/home/ubuntu/miniconda3/envs/nlg/lib/python3.9/site-packages/OpenAttack/utils/transformershook.py", line 7, in call output.retain_grad() RuntimeError: can't retain_grad on Tensor that has requires_grad=False

It doesn't work for any ids. It seems that only the operations parsed as nlpattribute or nlpattribute sentence throw this error for OLID.

nfelnlp commented 1 year ago

The embedding layer apparently keeps the hook added when using "adversarial" (OpenAttack):

self._forward_hooks:

OrderedDict([(468,
              <OpenAttack.utils.transformers_hook.HookCloser at 0x7f0d41696bf0>)])

This is added in OpenAttack here: https://github.com/thunlp/OpenAttack/blob/4df712e0a5aebc03daa9b1ef353da4b7ea0a1b23/OpenAttack/victim/classifiers/transformers.py#L56

We might need to unhook that after "adversarial" was used, because apparently this leads to problems with Captum (feature attribution). So far, I've found no option to do that through OpenAttack.

Thanks for reporting this!

tanikina commented 1 year ago

Hi! Thanks for the quick reply. I think you are right and OpenAttack's hook is causing this error. It should be possible to remove hooks in PyTorch: https://github.com/pytorch/pytorch/issues/5037 The following line victim.hook.remove() in actions/perturbations/adversarial.py seems to do the trick for me:


    # launch attacks and print attack results
    d = attack_eval.eval(dataset, visualize=False)

    victim.hook.remove()

    return_s = ""
nfelnlp commented 1 year ago

Very nice! I actually also just came up with a solution, but yours is probably better.

I simply reinstantiated the forward hooks in the embeddings_layer as an empty OrderedDict.

    if dataset_name == 'boolq':
        model = conversation.get_var("model").contents.model
        tokenizer = conversation.get_var("model").contents.tokenizer
        embeddings_layer = model.base_model.embeddings.word_embeddings
        victim = classifiers.TransformersClassifier(
            model, tokenizer, embeddings_layer
        )
    elif dataset_name == 'daily_dialog':
        model = conversation.get_var("model").contents
        tokenizer = HFTokenizer('bert-base-uncased', mode='bert').tokenizer
        embeddings_layer = model.bert.base_model.embeddings.word_embeddings
        victim = classifiers.TransformersClassifier(
            model.bert, tokenizer, embeddings_layer
        )
    elif dataset_name == 'olid':
        model = conversation.get_var("model").contents.model
        tokenizer = conversation.get_var("model").contents.tokenizer
        embeddings_layer = model.bert.embeddings.word_embeddings
        victim = classifiers.TransformersClassifier(
            model, tokenizer, embeddings_layer
        )
    else:
        raise NotImplementedError(f"{dataset_name} is not supported!")

    attacker = attackers.PWWSAttacker()

    # prepare for attacking
    attack_eval = AttackEval(attacker, victim)

    # launch attacks and print attack results
    d = attack_eval.eval(dataset, visualize=False)

    # Remove forward hooks after attack
    embeddings_layer._forward_hooks = OrderedDict()

Would you commit your solution and push it to main, please? :)

tanikina commented 1 year ago

Done: https://github.com/nfelnlp/InterroLang/commit/e1bdd0f4503c5ad603e76565c003cd0e5251086d Your solution also looks interesting, I did not know that one could use an OrderedDict for this case :) I think we can close this issue then.

nfelnlp commented 1 year ago

Great, thanks a lot! :smile: Me neither, my PyCharm debugger told me that the hooks are always OrderedDicts. This solution of "emptying" them seems very dirty to me, however.