GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
https://deepparse.org/
GNU Lesser General Public License v3.0
299 stars 30 forks source link

bpemb model download fails #232

Closed valentinschabschneider closed 4 months ago

valentinschabschneider commented 4 months ago

Describe the bug

When using the new ghcr.io/graal-research/deepparse:0.9.10 docker image (after fixing #231) the following error occurs:

app-1  | 2024-07-08 13:56:01,032; DEBUG: Starting new HTTPS connection (1): bpemb.h-its.org:443
app-1  | 2024-07-08 13:56:01,126; DEBUG: https://bpemb.h-its.org:443 "GET /multi/multi.wiki.bpe.vs100000.model HTTP/11" 404 196
app-1  | ERROR:    Traceback (most recent call last):
app-1  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan
app-1  |     async with self.lifespan_context(app) as maybe_state:
app-1  |   File "/usr/local/lib/python3.11/contextlib.py", line 204, in __aenter__
app-1  |     return await anext(self.gen)
app-1  |            ^^^^^^^^^^^^^^^^^^^^^
app-1  |   File "/deepparse/app/app.py", line 31, in lifespan
app-1  |     download_models()
app-1  |   File "/deepparse/download_tools.py", line 106, in download_models
app-1  |     download_model(model_type, saving_cache_path=saving_cache_path)
app-1  |   File "/deepparse/download_tools.py", line 130, in download_model
app-1  |     BPEmb(
app-1  |   File "/usr/local/lib/python3.11/site-packages/bpemb/bpemb.py", line 173, in __init__
app-1  |     self.model_file = self._load_file(model_file)
app-1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
app-1  |   File "/usr/local/lib/python3.11/site-packages/bpemb/bpemb.py", line 228, in _load_file
app-1  |     return http_get(file_url, cached_file, ignore_tardir=True)
app-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app-1  |   File "/usr/local/lib/python3.11/site-packages/bpemb/util.py", line 48, in http_get
app-1  |     headers = http_get_temp(url, temp_file)
app-1  |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app-1  |   File "/usr/local/lib/python3.11/site-packages/bpemb/util.py", line 25, in http_get_temp
app-1  |     req.raise_for_status()
app-1  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
app-1  |     raise HTTPError(http_error_msg, response=self)
app-1  | requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://bpemb.h-its.org/multi/multi.wiki.bpe.vs100000.model
app-1  |
app-1  | ERROR:    Application startup failed. Exiting.
app-1 exited with code 3

To Reproduce Run the provided docker-compose.yml with the ghcr.io/graal-research/deepparse:0.9.10 image.

Expected behavior The model is downloaded container starts normally.

Desktop (please complete the following information):

valentinschabschneider commented 4 months ago

I saw that this was attempted to be fixed but it still doesnt point to the correct path https://bpemb.h-its.org/multi/multi/multi.wiki.bpe.vs100000.model

davebulaval commented 4 months ago

It was fixed; the docker image seems to be the wrong one. It is fixed with 0.9.11.

valentinschabschneider commented 4 months ago

unfortunately the error still occurs

valentinschabschneider commented 4 months ago

The fix applied in bpemb_embeddings_model.py with BPEmbBaseURLWrapperBugFix needs also be applied in download_tools.py#L130

davebulaval commented 4 months ago

You are right. It is fixed in 0.9.12.