Knowledge graph Embeddings working with Allennlp

c0ntradicti0n commented 4 years ago

System (please complete the following information):

OS: Ubuntu 18.04
Python version: 3.7
AllenNLP version: 0.9.0
PyTorch version: 1.2.0

Question How to map knowledge graph embeddings to allennlp, so that the Viterbi-Algorithm of the crf does not get confused?

Setting I try to use graph embeddings of Ampligraph to map embeddings of a wordnet-graph as embeddings of prose text. I would like to build something similar like this: Deep Semantic Match Model for Entity Linking Using Knowledge Graph and Text

I have a setting, where I use a LSTM and CRF-Tagger, which works pretty well. And I thought, it would be great, to enrich the probabilistic language models with more declarative knowledge from such graphs as wordnet. So I tried to implement this by doing word-sense-disambiguation and lemmatization on the wordnet synsets and retrieving the embeddings of the synset-knowledge-graph as a token-embedder.

Errors But it comes to problems with the uncontinuous values, I think, that arise from words, that can't be mapped to some embeddings in the graph, as OOV words and stopword-like words, that are left out in wordnet. There happen three different things, respectively, what parameters I try with this setting:

it seems to predict some tags, that are out of the tags given by the training set
the loss function of the crf, which actually can't be negative, runs into negative infinity as it seems
the prediction gets stuck at 0.2 F-score (or at 0), that is totally below the values, one can reach without that

The first two errors arise with computations within the crf-tagger.

Do you think, that these graph embeddings are numerically uncompatible with the other embeddings? If there is some embedding for (synset 1)-relation-(synset 2), the comparison of the embeddings of synset 1 and synset 2, as my unmathematical mind imagines, something like a vector in hyperspace, pointing in some special direction for synonyms, antonyms, hypernyms and hyperonyms. But maybe there underlies some other clustering to this prediction of the relation, so that these embeddings don't reveal the kind of these relation, that can be recognized by the LSTM?

And regarding those stopwords and OOV-tokens, that would need some default, what would be a bedder default value for them, than the min or max? Should one think more about normalization of the embeddings, to not confuse the Viterbi-algorithm?

I made a repository, if you want to try something out. https://github.com/c0ntradicti0n/allennlp_vs_ampligraph

schmmd commented 4 years ago

If you have a specific concrete technical question we might be able to help, but we don't have a lot of context around this area--so we don't have time to dig deep.

c0ntradicti0n commented 4 years ago

Yeah, I understand. A more concrete question is: Can one mask not only drop-out, out-of-vocabulary and padding, but also a part of the embedding vector? What do you think mathematically?

I will also try to use some other knowledge embedding framework. Because I don't know, if my expectations are that realistic with the actual ones.

mojesty commented 4 years ago

If you wish to implement custom masking over your word embeddings, you can create your own Seq2SeqEncoder that zeroes out some part of its input. It could be boolean masking, zeroing some dimensions etc.

DeNeutoy commented 4 years ago

These questions are best asked on our forum, as they are not related to library development. https://discourse.allennlp.org/

c0ntradicti0n commented 3 years ago

Hello again,

now I come back to my question: I found a paper, that scrutinizes "Enriching BERT with Knowledge Graph Embeddings for Document Classification" (https://arxiv.org/pdf/1909.08402.pdf) as well as the code on this (https://github.com/malteos/pytorch-bert-document-classification) The downstream tasks, classifying some test based on labels, is something else than a tagger, that my question here supposed. But it seems possible and helpful to enhance transformers performance by knowledge embeddings.Therefore, here is the question again: Are there implementation steps towards this in the allennlp-framework, respectively, where has the forum gone to, if I ask, to go on this forward working based on the example?

allenai / allennlp

Knowledge graph Embeddings working with Allennlp #3366