caoyu-noob / TASA

The implementation for EMNLP 2022 paper TASA
1 stars 2 forks source link

About Attack Dataset #2

Open Tanusree-Das-Tithy opened 2 months ago

Tanusree-Das-Tithy commented 2 months ago

Hello, I recently tried to run TASA but I've encountered some issues. First, after converting dataset to MRQA I got only one json file, then I split that json file in dev.json (10% of data) and train.json(90% data) and they were exactly in the same format as TASA/data/squad/dev-v1.1.json. Then I did the training with bert and Spanbert model and the training epochs ran successfully. I also got coreference file as mentioned. But when I tried to launch the attack by running TASA.py, I could see that some processes (add overlap token, get edit importance and get edit synonyms) are having 100% progress but adversarial samples are not generating. Rather it's showing the error when I tried the attack on TASA/data/HotpotQA/dev.json:

File "TASA.py", line 998, in main
    "coreferences": coreferences[d_i][p_i]})
IndexError: list index out of range

I also tried with your example dataset (TASA/data/squad/dev-v1.1.json) to get the adversarial sample from the attack but I got this error:

 raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got 'outputs'

I'm also attaching the log file here. output.log

Please let me know the approach to solve this issue. Also it would be great if you provide me the correct dataset or data format to successfully run the attack to get adversarial samples.

caesar-jojo commented 2 months ago

Hello, I am not the author of the code, but I share the same confusion about it as you do. However, I have already trained the adversarial samples for SQuAD, and I can share them with you. Please leave your email address so we can communicate more.

Tanusree-Das-Tithy commented 2 months ago

Hello @caesar-jojo, thanks for your response. My email is tanusreedastithy@gmail.com. It would be great if you can give me the suggestion for getting adversarial samples.

caoyu-noob commented 2 months ago

I could see that some processes (add overlap token, get edit importance and get edit synonyms) are having 100% progress but adversarial samples are not generating. Rather it's showing the error when I tried the attack on TASA/data/HotpotQA/dev.json:

According to your log, it seems that the whole attacking process is not finished yet, as no output indicates that the adversarial samples have been saved. My suggestion is to set some breaking point (or some printing in the console) between the major functions in L900-918 in TASA.py to see where the program is stuck.

raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx)) TypeError: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices, got 'outputs'

Could you provide me with more information? Like the detailed lines and scripts throwing this exception. It looks like some issues with Tensorflow, but in this repo I only use Tensorflow for the calculation of USE similarity ./model/use_evaluator.

Tanusree-Das-Tithy commented 2 months ago

Thanks @caoyu-noob for your response, I am giving the whole error message here:

Traceback (most recent call last):
  File "TASA.py", line 1038, in <module>
    main()
  File "TASA.py", line 1008, in main
    adversarial_samples, new_data, sample_cnt, miss_cnt = qa_attacker.get_adversarial_samples(samples)
  File "TASA.py", line 936, in get_adversarial_samples
    samples = self.get_edited_sentences(samples)
  File "TASA.py", line 559, in get_edited_sentences
    self.filter_edited_sents(sample, edited_gold_sents, answer_starts)
  File "TASA.py", line 581, in filter_edited_sents
    cosine_sim = self.use_evaluator.get_semantic_sim_by_ref(sample["gold_sent"], filtered_gold_sents)
  File "/home/ahsan/TASA/model/use_evaluator.py", line 27, in get_semantic_sim_by_ref
    embed_ref = self.cal_use_feature([ref])
  File "/home/ahsan/TASA/model/use_evaluator.py", line 15, in cal_use_feature
    embed = self.embed(sents)["outputs"]
  File "/home/ahsan/anaconda3/envs/myenv/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
    return target(*args, **kwargs)
  File "/home/ahsan/anaconda3/envs/myenv/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 1014, in _slice_helper
    _check_index(s)
  File "/home/ahsan/anaconda3/envs/myenv/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 888, in _check_index
    raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got 'outputs'

I tried to type some print statement in USE Evaluator file inside the functions but nothing is printing in the console. Also, about the USE model link that you provided, I downloaded Tensorflow 2, Version 2 and multilingual-qa USE model from that link, is that the correct USE model compatible with the project? Please let me know.

caoyu-noob commented 2 months ago

Seems that you used a model that is different from the one I used in my experiments.

I remember I used universal-sentence-encoder Version 2 in Tensorflow 2. You can try this model to see if this problem can be addressed.

Tanusree-Das-Tithy commented 2 months ago

Hello @caoyu-noob, I tried with the same model as you mentioned but it's showing the same error as I mentioned earlier. Can there be any other issue for which this error is occurring?

caoyu-noob commented 2 months ago

It seems that they have changed the output pattern in the latest model.

You can remove ['output'] in L14 of ./model/use_evaluator.py. and turn this line into embed = self.embed(sents).

It works for me.