fmarten / JoSimText

A system for word sense induction and disambiguation based on JoBimText approach
0 stars 0 forks source link

Support for not only enhanced dependencies #3

Closed alexanderpanchenko closed 7 years ago

alexanderpanchenko commented 7 years ago

Problem

The CoNLL 2 TextContext extractor currently takes the values of dependencies from the column with the, so called, enhanced dependencies. However, very often column is not filled like in the file below: http://panchenko.me/data/joint/corpora/cc16-conll-copp-sample-newlines-no-enhanced.csv.gz

Actually, in the majority of the Conll corpora available online this column is not filled. Example of such file is presented below:

image

Many existing Conll files are not usable with our tool currently.

Solution

If no dependencies are found in the respective column, e.g. "" or "_", then you need to use another column to generate features (the two columns which directly precede the column you currently use).

fmarten commented 7 years ago
  1. HEAD: Head of the current word, which is either a value of ID or zero (0).
  2. DEPREL: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
fmarten commented 7 years ago
  1. DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs.

if this column has null token (_) parser did not support enchnaced deps graph, so use normal dependency graph.

alexanderpanchenko commented 7 years ago

can you please add commit numbers for info?

fmarten commented 7 years ago

https://github.com/fmarten/JoSimText/commit/48faa88cee796bef8b2055b0b2a7e0c504cab62b