We want to be able to train the model on a larger corpus, therefore semcor should be used.
Potential issues:
. Ontonotes dataset uses onto_sense, more coarse-grained sense definitions than wordnet,
because of that extending sense2id, sense --> related (sense2related) mappings needs to be done carefully
Same wordnet sense may be linked to two different onto notes sense and vice-versa. In short, there is a many to many relationship.
Solutions:
Extend the sense2id, sense2related mappings by just appending the new wordnet_sense keys to appropriate ids and related words. Notice that wordnet sense tags may be the same as our converted on_sense tags.
We want to be able to train the model on a larger corpus, therefore semcor should be used.
Potential issues:
Solutions: