diffbot / knowledge-net

KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
MIT License
262 stars 35 forks source link

dygie++ question #6

Closed wingz1 closed 4 years ago

wingz1 commented 4 years ago

How did you convert the knowledge-net dataset into the format for DYGIE++ to ingest for training and testing to get the scores you quote?

schmidek commented 4 years ago

I have a public branch https://github.com/schmidek/dygiepp/tree/multitask with some changes needed to train DYGIE++ on KnowledgeNet, in particular KnowledgeNet is not exhaustively annotated for all predicates on all sentences. The data format used is the same as https://github.com/dwadden/dygiepp/blob/master/doc/data.md#data-format with the addition of one field annotatedPredicates which is just a list of which predicates were annotated for each sentence. Unfortunately I can't easily share the code that I used to convert the dataset at this time, as it has some internal dependencies.

wingz1 commented 4 years ago

Thanks for the reply. Given that you are not easily able to share the conversion code, are you able to share the reformatted KnowledgeNet dataset? (ie. The train.json (and perhaps dev.json or test.json) files). I'd like to try training KnowledgeNet with DyGIE++.

wingz1 commented 4 years ago

Hi, is the reformatted KnowledgeNet dataset in DYGIE++ format available for sharing? If so, where? Thanks!

schmidek commented 4 years ago

Sure, here's the dataset

wingz1 commented 4 years ago

Thanks!

I see that only the dev.json file has anything in it. Train.json is empty. (Maybe it's still uploading?)

schmidek commented 4 years ago

Fixed