adityak6798 / Transformers-For-Negation-and-Speculation

This is the code for Negation and Speculation Cue Detection and Scope Resolution using BERT, XLNet and RoBERTa
MIT License
31 stars 7 forks source link

Error in data preprocessing: seperated multiword cues (e.g. neither...nor...) in xml input #3

Closed LuciusLan closed 3 years ago

LuciusLan commented 3 years ago

Hi, I'm working on a school project on negation detection building on top of your codes. They are really helpful!

Just I noticed that in the preprocessing part, it dealed with the seperated multiword cue wrongly: e.g.: <sentence id="S180.13">In contrast, sodium salicylate (1 mM) inhibited <xcope id="X180.13.1"><cue type="negation" ref="X180.13.1">neither</cue> adhesion <cue type="negation" ref="X180.13.1">nor</cue> expression of these adhesion molecules</xcope>.</sentence> (from bioscope abstracts)

Notice the two cues within one scope, with the same ref=X180.13.1

in your code for xml input: cue[c_idx[-1]] = []

From my understanding this is to initialize a list for a cue. But in this case, c_idx[-1] is X180.13.1 for both. When it comes 'neither', cue['"X180.13.1"'] = [7] (7 for position of neither) When it comes to 'nor', it flushs the list, so the result only left with [9] instead of [7, 9].

You should probably add a if like: if c_idx[-1] not in cue.keys() to test if the key already existed.

LuciusLan commented 3 years ago

Created a PR for this: https://github.com/adityak6798/Transformers-For-Negation-and-Speculation/pull/4

Didn't touched the multitask notebook as I'm not sure if it is the case for speculation cues.

adityak6798 commented 3 years ago

Merged the PR