allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

get user warning when I try to run NER demo code #4932

Closed mchari closed 3 years ago

mchari commented 3 years ago

I am trying to run the NER demo as instructed :

pip install allennlp==1.0.0 allennlp-models==1.0.0

from allennlp.predictors.predictor import Predictor import allennlp_models.tagging

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/fine-grained-ner.2020-06-24.tar.gz") predictor.predict( sentence="Did Uriah honestly think he could beat The Legend of Zelda in under three hours?." )

but I get this user warning. UserWarning: You are using the default value (0) of min_padding_length, which can cause some subtle bugs (more info see https://github.com/allenai/allennlp/issues/1954). Strongly recommend to set a value, usually the maximum size of the convolutional layer size when using CnnEncoder.

How do I resolve this ?

jbrry commented 3 years ago

This error occurs when no min_padding_length is specified when using the TokenCharactersIndexer.

That is indeed the case in the config.json file from the NER demo model you have linked where this argument wasn't passed when the model was initially trained (so it's not really down to anything you are doing wrong per se):

    "dataset_reader": {
        "type": "ontonotes_ner",
        "coding_scheme": "BIOUL",
        "token_indexers": {
            "elmo": {
                "type": "elmo_characters"
            },
            "token_characters": {
                "type": "characters"
            },
            "tokens": {
                "type": "single_id",
                "lowercase_tokens": true
            }
        }
    },

I'm not sure if it's a good idea changing the config file at evaluation time but in future, this warning is avoided by setting a min_padding_length in your config like below:

        "token_indexers": {
            ...
            "token_characters": {
                "type": "characters",
                "min_padding_length": 3
            },
            ...
        }
mchari commented 3 years ago

I would hope that the config.json file in the tar file could be edited and "re-tarred" to save every user from doing it.

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜

github-actions[bot] commented 3 years ago

@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜