GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
https://deepparse.org/
GNU Lesser General Public License v3.0
299 stars 30 forks source link

[BUG] download_model fails to load bpemb model #221

Closed jarkkojarvinen closed 4 months ago

jarkkojarvinen commented 6 months ago

Describe the bug The download_model tool fails to load bpemb model

To Reproduce

from deepparse import download_model
cache_dir = "./cache/"
download_model("bpemb-attention", cache_dir)

and result is:

downloading https://nlp.h-its.org/bpemb/multi/multi.wiki.bpe.vs100000.model
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 cache_dir = "./cache/"
----> 2 download_model("bpemb-attention", cache_dir)

File /usr/local/lib/python3.11/site-packages/deepparse/download_tools.py:130, in download_model(model_type, saving_cache_path)
    128     download_fasttext_magnitude_embeddings(cache_dir=saving_cache_path)
    129 elif "bpemb" in model_type:
--> 130     BPEmb(
    131         lang="multi", vs=100000, dim=300, cache_dir=saving_cache_path
    132     )  # The class manage the download of the pretrained words embedding
    134 model_type_filename = MODEL_MAPPING_CHOICES[model_type]
    135 model_path = os.path.join(saving_cache_path, f"{model_type_filename}.ckpt")

File /usr/local/lib/python3.11/site-packages/bpemb/bpemb.py:173, in BPEmb.__init__(self, lang, vs, dim, cache_dir, preprocess, encode_extra_options, add_pad_emb, vs_fallback, segmentation_only, model_file, emb_file)
    171 else:
    172     model_file = self.model_tpl.format(lang=lang, vs=vs)
--> 173     self.model_file = self._load_file(model_file)
    174 self.spm = sentencepiece_load(self.model_file)
    175 self.vocab_size = self.vs = self.spm.get_piece_size()

File /usr/local/lib/python3.11/site-packages/bpemb/bpemb.py:228, in BPEmb._load_file(self, file, archive, cache_dir)
    226 file_url = self.base_url + file + suffix
    227 print("downloading", file_url)
...
   1018     )
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: Not Found for url: https://bpemb.h-its.org/multi/multi.wiki.bpe.vs100000.model

Probably URL has been changed to https://bpemb.h-its.org/multi/multi/multi.wiki.bpe.vs1000000.model? (NOTE: extra multi in url)

Expected behavior Model is downloaded without error

Desktop (please complete the following information):

github-actions[bot] commented 6 months ago

Thank you for you interest in improving Deepparse.

gokdumano commented 6 months ago

I faced the same issue, it is not validated but I think changing the _baseurl in bpemb.py file might help you.

davebulaval commented 6 months ago

Yeah, it is a problem with the BPEMB package. I have opened an issue.

jarkkojarvinen commented 5 months ago

Thanks. Seems that there are some bugs now in dependencies :/

davebulaval commented 5 months ago

It seems like it is for BPEmp. I will wait a couple of days before trying to fix it on our side. I want to avoid more packages to maintain.

davebulaval commented 5 months ago

I have written to the maintainers, but have not received any response so far.

davebulaval commented 4 months ago

Since the maintainers' response is too long, I have pushed a hotfix with a BPEmb wrapper to change the base URL. It will be released in version 0.9.10.