alexandrainst / danlp

DaNLP is a repository for Natural Language Processing resources for the Danish Language.
BSD 3-Clause "New" or "Revised" License
196 stars 33 forks source link

`flair.ner` missing from MODELS list in `download.py` #15

Closed tobiasmorville closed 5 years ago

tobiasmorville commented 5 years ago

When trying your example

from danlp.models.ner_taggers import load_ner_tagger_with_flair
from flair.data import Sentence

# Load the NER tagger using the DaNLP wrapper
flair_model = load_ner_tagger_with_flair()

# Using the flair NER tagger
sentence = Sentence('jeg hopper på en bil som er rød sammen med Jens-Peter E. Hansen') 
flair_model.predict(sentence) 
print(sentence.to_tagged_string())

several things fail. See #14 for first problem.

Next problem is that flair.ner listed in download.py

model_weight_path = download_model('flair.ner', cache_dir, process_func=_unzip_process_func, verbose=verbose)

is not in the MODELS list, which yields an error.

MODELS = {
    # WORD EMBEDDINGS
    'wiki.da.wv': {
        'url': 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.da.vec',
        'vocab_size': 312956,
        'dimensions': 300,
        'md5_checksum': '892ac16ff0c730d7230c82ad3d565984',
        'size': 822569731,
        'file_extension': '.bin'
    },
    'cc.da.wv': {
        'url': 'https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.da.300.vec.gz',
        'vocab_size': 2000000,
        'dimensions': 300,
        'md5_checksum': '68a766bf25409ae96d334493df026419',
        'size': 1221429539,
        'file_extension': '.bin'
    },
    'connl.da.wv': {
        'url': 'http://vectors.nlpl.eu/repository/11/38.zip',
        'vocab_size': 1655870,
        'dimensions': 100,
        'md5_checksum': 'cc324d04a429f80825fede0d6502543d',
        'size': 624863834,
        'file_extension': '.bin'
    },
    'news.da.wv': {
        'url': 'https://loar.kb.dk/bitstream/handle/1902/329/danish_newspapers_1880To2013.txt?sequence=4&isAllowed=y',
        'vocab_size': 2404836,
        'dimensions': 300,
        'size': 6869762980,
        'md5_checksum': 'e0766f997e04dddf65aec5e2691bf36d',
        'file_extension': '.bin'
    },
    'wiki.da.swv': {
        'url': 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.da.zip',
        'md5_checksum': '86e7875d880dc1f4d3e7600a6ce4952d',
        'size': 3283027968,
        'file_extension': '.bin'
    },
    'cc.da.swv': {
        'url': 'https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.da.300.bin.gz',
        'size': 4509731789,
        'md5_checksum': '562d7b49ab8ee45892f6e28b02db5f01',
        'file_extension': '.bin'
    },

    # CONTEXTUAL EMBEDDINGS
    'flair.fwd': {
        'url': DANLP_S3_URL+'/models/flair.fwd.zip',
        'md5_checksum': '8697e286048a4aa30acc62995397a0c8',
        'size': 18548086,
        'file_extension': '.pt'
    },
    'flair.bwd': {
        'url': DANLP_S3_URL+'/models/flair.bwd.zip',
        'md5_checksum': '11549f1dc28f92a7c37bf511b023b1f1',
        'size': 18551173,
        'file_extension': '.pt'
    },

    # POS MODELS
    'flair.pos': {
        'url': DANLP_S3_URL + '/models/flair.pos.zip',
        'md5_checksum': 'b9892d4c1c654503dff7e0094834d6ed',
        'size': 426404955,
        'file_extension': '.pt'
    }
}
tobiasmorville commented 5 years ago

I can see that both problems #14 & this stems from the fast that the pip distribution is not updated. When looking at your repo, the flair.ner model is there.

hvingelby commented 5 years ago

Hi @tomonodes

You are right, the problem was that the pip package was not updated. I have updated it to v0.0.4 which includes the Flair NER tagger.

https://github.com/alexandrainst/danlp/releases/tag/v0.0.4

Does it work for you now? :slightly_smiling_face:

tobiasmorville commented 5 years ago

👍