DFKI-NLP / InterroLang

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations [EMNLP 2023 Findings]
https://arxiv.org/abs/2310.05592
5 stars 1 forks source link

build DA prototype #34

Closed qiaw99 closed 1 year ago

qiaw99 commented 1 year ago

TODOs:

Note: Download the model from here: https://cloud.dfki.de/owncloud/index.php/s/m72HGNLW2TyCABr and put it to the right place according to gin config file.

qiaw99 commented 1 year ago

score operation for macro, micro, weighted is adapted according to #39, please add corresponding prompts for them. :)

nfelnlp commented 1 year ago

Testing the sample prompts, there seem to be quite a lot of errors. I think we should fix these before we can call this the first DA prototype:

⚡ important features

input: what are the reasoning strategies you leverage? parsed: important all [e]

Actual parse:

input: what are the reasoning strategies you leverage? parsed: previousfilter and explain features [e]

explain features should not be parsed here. Is this a problem with the GPT-Neo parser or are grammar and prompt files for this branch incorrect?

Results in an empty response.

Answer: image It's parsing problem. image

⚡ explanations

  File "/home/nfel/PycharmProjects/InterroLang/flask_app.py", line 113, in sample_prompt
    prompt = sample_prompt_for_action(action,
  File "/home/nfel/PycharmProjects/InterroLang/logic/sample_prompts_by_action.py", line 95, in sample_prompt_for_action
    raise NameError(message)
NameError: Unable to filename ending in explain.txt!

Apparently, there is also a mismatch with prompt files here.

Answer: fixed in main branch. image

⚡ predictions

input: predict 22 parsed: filter id 22 and predict [e]

  File "/home/nfel/PycharmProjects/InterroLang/logic/core.py", line 416, in update_state
    returned_item = run_action(
  File "/home/nfel/PycharmProjects/InterroLang/logic/action.py", line 46, in run_action
    action_return, action_status = actions[p_text](
  File "/home/nfel/PycharmProjects/InterroLang/actions/prediction/predict.py", line 291, in predict_operation
    return_s = prediction_with_id(model, data, conversation, text)
  File "/home/nfel/PycharmProjects/InterroLang/actions/prediction/predict.py", line 238, in prediction_with_id
    model_predictions = model.predict(data, text)
  File "/home/nfel/PycharmProjects/InterroLang/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DANetwork' object has no attribute 'predict'

DA classifier does not allow predict operation yet.

⚡ random predictions

Same as above (missing predict attribute in DANetwork)

Answer to Prediction and random prediction:

image

✅ prediction likelihood

Passed!

For instance with id 433: The likelihood of class dummy is 0.759% The likelihood of class inform is 3.779% The likelihood of class question is 2.489% The likelihood of class directive is 88.679% The likelihood of class commissive is 4.293%

⚡ showing data

For the data with id equal to 2185, the features are

dialog: oh, yes. that would be lovely.

This is not the entire dialogue, but one single turn. I think the same error is happening in the dataset viewer, where each ID is just one turn.

⚡ performance

input: can you show me the precision on the data? parsed: score precision [e]

  File "/home/nfel/PycharmProjects/InterroLang/flask_app.py", line 139, in get_bot_response
    response = BOT.update_state(user_text, conversation)
  File "/home/nfel/PycharmProjects/InterroLang/logic/core.py", line 416, in update_state
    returned_item = run_action(
  File "/home/nfel/PycharmProjects/InterroLang/logic/action.py", line 46, in run_action
    action_return, action_status = actions[p_text](
  File "/home/nfel/PycharmProjects/InterroLang/actions/prediction/score.py", line 69, in score_operation
    raise NotImplementedError(f"Flag {average} is not supported!")
NotImplementedError: Flag [e] is not supported!

Answer: image

✅ how to change a prediction

Passed.

input: what is the way to change the prediction for the data point with the id number 1177 parsed: filter id 1177 and cfe [e]

This sentence is always classified as question!

⚡ dataset description

input: could you describe the data and model? parsed: data and model [e]

  File "/home/nfel/PycharmProjects/InterroLang/logic/action.py", line 46, in run_action
    action_return, action_status = actions[p_text](
  File "/home/nfel/PycharmProjects/InterroLang/actions/metadata/data_summary.py", line 66, in data_operation
    score = conversation.describe.get_eval_performance_for_hf_model(dataset_name, conversation.default_metric)
  File "/home/nfel/PycharmProjects/InterroLang/logic/dataset_description.py", line 220, in get_eval_performance_for_hf_model
    performance_summary = self.get_score_text(y_values,
TypeError: DatasetDescription.get_score_text() missing 2 required positional arguments: 'multi_class' and 'average'
[2023-05-15 13:26:31,850] INFO in flask_app: Exception getting bot response: DatasetDescription.get_score_text() missing 2 required positional arguments: 'multi_class' and 'average'

Answer: image

⚡ mistakes

input: show me some data you predict incorrectly parsed: mistake sample [e]

For all the instances in the data, the model is incorrect 1 out of 2387 times (error rate 0.0). Here are the ids of instances the model predicts incorrectly:

[[ 0 1 2 ... 2384 2385 2386]]

This does not seem right to me. Also the way the array is formatted should be changed somehow. I don't know if it's even helpful to print all the incorrectly predicted IDs, maybe just a few of them? What would be most useful for the user?

✅ data labels

Passed.

input: what are the labels for all the data parsed: label [e]

For all the instances in the data,: 0.0% instances have label dummy 38.542% instances have label inform 28.907% instances have label question 20.989% instances have label directive 11.563% instances have label commissive

qiaw99 commented 1 year ago

Regarding to explanation and important features: https://github.com/nfelnlp/InterroLang/blob/c1044d33c19d67cfde62b7aabb93170faa0d5327/logic/sample_prompts_by_action.py#L21 fixed in main branch