atcbosselut / comet-commonsense

Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction" https://arxiv.org/abs/1906.05317
Apache License 2.0
669 stars 126 forks source link

Can this model give a knowledge graph according to the input sentence? #9

Closed zysNLP closed 5 years ago

zysNLP commented 5 years ago

Can this model give a knowledge graph according to the input sentence? Such as "One absolutely cardinal reason is the fact that universities offer an opportunity to intensify the knowledge in a particular field of interest." Maybe this is a long sentence, I want to ask if our model has such a function?

debjitpaul commented 5 years ago

You can generate concepts (ConceptNet) or person's intention/motivation (Atomic) given a sentence (as it uses the pretrained-GPT). But, it was trained using transformer, therefore the length of the sentence is important. You can check there pretained model and also play a bit with the interactive model. COMET was trained to generate concept (for ConceptNet i.e., predict the concept given the concept and relation) or reactions and intention given a event. Size of the event or input concept do matter.

atcbosselut commented 5 years ago

As @debjitpaul alluded to, COMET is a neural knowledge base. In theory, you can use the method to generate a temporary knowledge graph for any sequence. You're only limited by what the maximum event length for the model is. For this particular codebase, I believe the maximum sentence length COMET takes in is 17 tokens because that's the maximum length of an event in ATOMIC.

You can definitely play around with setting a different max size for the input here: https://github.com/atcbosselut/comet-commonsense/blob/0a8a94b2342536f940fb4b57c8bc0ee35b435c23/src/data/atomic.py#L118

which should allow you to train models that have larger input contexts. Or you can split larger sentences into smaller phrases. Even if the neural knowledge base can handle longer contexts, you're going to get less accurate information if the input has a lot of unnecessary details.

You can also use interactive mode to try out custom contexts less than 17 tokens long or play with the demo here: https://mosaickg.apps.allenai.org/

atcbosselut commented 5 years ago

I'd also mention that ConceptNet COMET may also behave differently because it is trained on shorter incomplete phrases.

guotong1988 commented 4 years ago

Same question. Thank you.