gabrielStanovsky / unified-factuality

Code, data and models for the paper "Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets" (Stanovsky, Eckle-Kohler, Puzikov, Dagan and Gurevych ACL 2017)
MIT License
7 stars 0 forks source link

Converting "NA" label from FactBank #20

Closed rudinger closed 7 years ago

rudinger commented 7 years ago

FactBank conversion conversion has no mapping behavior for the "NA" label.

Traceback (most recent call last):
  File "./readers.py", line 660, in <module>
    os.path.join(inp_, "tokens_tml.txt"))
  File "./readers.py", line 136, in __init__
    self.conll_txt = self.convert(tokens_tml)
  File "./readers.py", line 209, in convert
    if (tmlTag == 'EVENT')\
  File "./readers.py", line 189, in consolidate_fact_value
    else self.to_float(list(opts.values()[0])[0])
  File "./readers.py", line 165, in to_float
    return self.conversion_dic[fact_val]
KeyError: 'NA'

NA could either be mapped to a value, by modifying self.conversion_dic in Factbank's __init__ method, or maybe it should receive no annotation whatsoever? According to the FactBank annotation guide (p23):

NA: Select NA if it seems that the event cannot be assessed in terms of factuality.

gabrielStanovsky commented 7 years ago

@rudinger, do you get this error when you run the convert_factbank.sh script ?

This runs without errors on my machine, and I suspect your error may happen if we have different factbank files. The script reads an annotation file called fb_factValue.txt, which looks like:

'wsj_0811.tml'|||4|||'f7'|||'e11'|||'ei156'|||'s0'|||'reduce'|||'AUTHOR'|||'Uu'
'wsj_0811.tml'|||4|||'f8'|||'e13'|||'ei157'|||'s0'|||'said'|||'AUTHOR'|||'CT+'
'wsj_0811.tml'|||4|||'f9'|||'e38'|||'ei154'|||'s2_s0'|||'due'|||'company_AUTHOR'|||'CT+'
'wsj_0811.tml'|||4|||'f10'|||'e38'|||'ei154'|||'s0'|||'due'|||'AUTHOR'|||'Uu'

My version of this file doesn't have any NA labels. Does your file have lines annotated with NA? If so, can you post some examples?

If that's indeed the case, I think I'll update the conversion script to ignore these labels.

Thanks!

rudinger commented 7 years ago

Yes, it looks like we have different fb_factValue.txt files:

'CNN19980126.1600.1104.tml'|||3|||'f6'|||'e40'|||'ei260'|||'s0'|||'initiative'|||'AUTHOR'|||'CT+'
'CNN19980126.1600.1104.tml'|||3|||'f7'|||'e41'|||'ei259'|||'s0'|||'cutting'|||'AUTHOR'|||'NA'
'CNN19980126.1600.1104.tml'|||3|||'f8'|||'e4'|||'ei261'|||'s0'|||'revitalize'|||'AUTHOR'|||'Uu'
gabrielStanovsky commented 7 years ago

It seems that this error occurs when using a slightly different version of FactBank, which replaces some Uu labels with NA, which seems to be semantically identical. Reverting to Uu labels solves the problem.