getalp / UFSAC

UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them
MIT License
37 stars 4 forks source link

Adverbs with sensekeys in MASC #8

Closed danlou closed 4 years ago

danlou commented 4 years ago

Hi,

I've noticed that the masc.xml file in UFSAC 2.1 contains adverbs annotated with WN sense keys. I wasn't expecting this, as the UFSAC paper (Tab. 1), and MASC's documentation, report that no adverbs are annotated in that corpus.

For example, line 316 of masc.xml: <word surface_form="here" lemma="here" pos="RB" wn30_key="here%4:02:00::;here%4:02:01::;here%4:02:02::" />

I find a total of 11,675 RBs with sense annotations. Can you tell us where they come from? Are these automatically assigned?

Thanks, Daniel

loic-vial commented 4 years ago

Hi @danlou , I'm really sorry for answering you this late (almost 4 months later ^^'), I've been finishing my PhD thesis and moving place :)

I checked the original data from Google's MASC (the corpus we converted), which is available here: https://github.com/google-research-datasets/word_sense_disambigation_corpora and there are indeed some adverbs which are annotated. See for instance the file /masc/written/blog/Acephalous-Cant-believe.xml and search for "ADV".

Then I checked the "masc.xml" corpus in UFSAC 1.0.0 and UFSAC 2.1, and I see that we did not change the "ADV" tag from the original corpus to the right "RB" tag in the first version. I think that we missed all the adverbs of this corpus in our statistics in the paper because we searched for "RB" tags, and this was corrected in later version.

So the adverbs are correct and not automatically assigned ! It's just a mistake in the first version of UFSAC.

danlou commented 4 years ago

Hi @loic-vial,

No worries, thanks for taking the time now :)

I don't remember looking for ADV, so that makes perfect sense. Great to know that it's fine!

Best