This repository contains the dataset of the article named "UFSAC: Unification of Sense Annotated Corpora and Tools", written by Loïc Vial, Benjamin Lecouteux and Didier Schwab, for the 11th edition of the Language Resources and Evaluation Conference (LREC) that took place in May 2018 in Miyazaki, Japan.
The full article is available at the following URL: http://www.lrec-conf.org/proceedings/lrec2018/summaries/250.html.
This repository contains:
The sense annotated corpora in UFSAC, the format described in the paper, available through direct links, see below. Note that the files have been compressed using the tool xz
and therefore needs to be decompressed with unxz
or similar.
The last version (2.1) contains the following corpora annotated with WordNet 3.0 sense keys:
semcor.xml
wngt.xml
masc.xml
omsti.xml
trainomatic.xml
senseval2.xml
and raganato_senseval2.xml
senseval2_lexical_sample_train.xml
and senseval2_lexical_sample_test.xml
senseval3task1.xml
and raganato_senseval3.xml
senseval3task6_train.xml
and senseval3task6_test.xml
semeval2007task7.xml
semeval2007task17.xml
and raganato_semeval2007.xml
semeval2013task12.xml
and raganato_semeval2013.xml
semeval2015task13.xml
and raganato_semeval2015.xml
raganato_ALL.xml
The source code of the Java API and the scripts described in the paper, in the folder java
.
Scripts for converting corpora from various formats (Semcor, DSO, OMSTI...) into UFSAC, converting UFSAC corpora into Raganato et al.'s format, computing MFS, etc., in the folder scripts
If you want to use the Java API or the scripts, the prerequisites are:
Once they are installed, you must compile the code:
java
foldermvn compile
or ./compile.sh
And if you want to use the library as a dependency in another Maven projects:
java
foldermvn install
or ./install.sh
Direct link to the data: https://drive.google.com/file/d/1kwBMIDBTf6heRno9bdLvF-DahSLHIZyV
<major version>.<minor version>
Direct link to the data: https://drive.google.com/file/d/1XKOnRPnm0TSia1PKwe2xsGE4IDqvAAbb
Direct link to the data: https://drive.google.com/file/d/1-II0demgruLdSdI8SC6dmnIqDNrZvdpW
Original version which contains the following corpora:
Plus the code to produce the UFSAC version from the original version of the following corpora: