allenai / kb

KnowBert -- Knowledge Enhanced Contextual Word Representations
Apache License 2.0
370 stars 50 forks source link

Loading the AIDA dataset #22

Closed GaiYu0 closed 4 years ago

GaiYu0 commented 4 years ago

First, thank you for making your code available!

When running bin/evaluate_wiki_linking.py, I got the error

allennlp.common.checks.ConfigurationError: "aida_wiki_linking not in acceptable choices for type: ['ccgbank', 'conll2003', 'conll2000', 'ontonotes_ner', 'coref', 'winobias', 'event2mind', 'interleaving', 'language_modeling', 'multiprocess', 'ptb_trees', 'drop', 'squad', 'quac', 'triviaqa', 'qangaroo', 'srl', 'semantic_dependencies', 'seq2seq', 'sequence_tagging', 'snli', 'universal_dependencies', 'sst_tokens', 'quora_paraphrase', 'atis', 'nlvr', 'wikitables', 'template_text2sql', 'grammar_based_text2sql', 'quarel', 'simple_language_modeling', 'babi', 'copynet_seq2seq', 'text_classification_json']"

I passed the test

pytest -v tests

Did I miss anything? Thank you very much!

matt-peters commented 4 years ago

Can you post a full traceback?

GaiYu0 commented 4 years ago

Sure. The traceback: image

A script that triggers the error:

from allennlp.common import Params
from allennlp.data import DatasetReader

reader_params = Params({
        "type": "aida_wiki_linking",
        "entity_disambiguation_only": False,
        "entity_indexer": {
            "type": "characters_tokenizer",
            "namespace": "entity_wiki",
            "tokenizer": {
                "type": "word",
                "word_splitter": {
                    "type": "just_spaces"
                }
            }
        },
        "extra_candidate_generators": {
            "wordnet": {
                "type": "wordnet_mention_generator",
                "entity_file": "s3://allennlp/knowbert/wordnet/entities.jsonl"
            }
        },
        "should_remap_span_indices": True,
        "token_indexers": {
            "tokens": {
                "type": "bert-pretrained",
                "do_lowercase": True,
                "max_pieces": 512,
                "pretrained_model": "bert-base-uncased",
                "use_starting_offsets": True,
            }
        }
    })

reader = DatasetReader.from_params(Params(reader_params))
matt-peters commented 4 years ago

You need to import the aida_wiki_linking reader so that allennlp can use the correct class. Add from kb.include_all import LinkingReader to the top of your script.

GaiYu0 commented 4 years ago

Issue solved. Thank you very much!