Error on using `load` with `format` argument

bagustris commented 2 months ago

I tried to use load with format a argument but getting this (backend) error

In [5]: import audb
In [6]: db = audb.load('emodb', format='wav', verbose=True)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/audbackend/core/utils.py:27, in call_function_on_backend(function, suppress_backend_errors, fallback_return_value, *args, **kwargs)
     26 try:
---> 27     return function(*args, **kwargs)
     28 except Exception as ex:

File ~/.local/lib/python3.8/site-packages/audbackend/core/filesystem.py:41, in FileSystem._access(self)
     40 if not os.path.exists(self._root):
---> 41     utils.raise_file_not_found_error(self._root)

File ~/.local/lib/python3.8/site-packages/audbackend/core/utils.py:107, in raise_file_not_found_error(path)
    106 def raise_file_not_found_error(path: str):
--> 107     raise FileNotFoundError(
    108         errno.ENOENT,
    109         os.strerror(errno.ENOENT),
    110         path,
    111     )

FileNotFoundError: [Errno 2] No such file or directory: '/home/bagus/audb-host/data-local/'

During handling of the above exception, another exception occurred:

BackendError                              Traceback (most recent call last)
Cell In[6], line 1
----> 1 db = audb.load('emodb', format='wav', verbose=True)

File ~/.local/lib/python3.8/site-packages/audb/core/load.py:1019, in load(name, version, only_metadata, bit_depth, channels, format, mixdown, sampling_rate, attachments, tables, media, removed_media, full_path, cache_root, num_workers, timeout, verbose)
    919 r"""Load database.
    920 
    921 Loads meta and media files of a database to the local cache and returns
   (...)
   1016 
   1017 """
   1018 if version is None:
-> 1019     version = latest_version(name)
   1021 db = None
   1022 cached_versions = None

File ~/.local/lib/python3.8/site-packages/audb/core/api.py:454, in latest_version(name)
    435 def latest_version(
    436     name,
    437 ) -> str:
    438     r"""Latest version of database.
    439 
    440     Args:
   (...)
    452 
    453     """
--> 454     vs = versions(name)
    455     if not vs:
    456         raise RuntimeError(
    457             f"Cannot find a version for database '{name}'.",
    458         )

File ~/.local/lib/python3.8/site-packages/audb/core/api.py:607, in versions(name)
    605 vs = []
    606 for repository in config.REPOSITORIES:
--> 607     backend = utils.access_backend(repository)
    608     if isinstance(backend, audbackend.Artifactory):
    609         import artifactory

File ~/.local/lib/python3.8/site-packages/audb/core/utils.py:17, in access_backend(repository)
     13 def access_backend(
     14     repository: Repository,
     15 ) -> audbackend.Backend:
     16     r"""Helper function to access backend."""
---> 17     backend = audbackend.access(
     18         repository.backend,
     19         repository.host,
     20         repository.name,
     21     )
     22     if isinstance(backend, audbackend.Artifactory):
     23         backend._use_legacy_file_structure()

File ~/.local/lib/python3.8/site-packages/audbackend/core/api.py:87, in access(name, host, repository)
     48 r"""Access repository.
     49 
     50 Returns a backend instance
   (...)
     84 
     85 """
     86 backend = _backend(name, host, repository)
---> 87 utils.call_function_on_backend(backend._access)
     88 return backend

File ~/.local/lib/python3.8/site-packages/audbackend/core/utils.py:32, in call_function_on_backend(function, suppress_backend_errors, fallback_return_value, *args, **kwargs)
     30     return fallback_return_value
     31 else:
---> 32     raise BackendError(ex)

BackendError: An exception was raised by the backend, please see stack trace for further information.

I also tried it with crema-d and got the same error. Although the original dataset maybe is already in wav format, this should not raises error, since the user want to ensure the correct audio format.

hagenw commented 2 months ago

Thanks for reporting this. The problem is that in the default config, we also provide a repository on your local machine, see https://github.com/audeering/audb/blob/069cc042341ef244df6068651b518c439680b7b8/audb/core/etc/audb.yaml#L7-L9

But if that repository does not exists, it raises an error.

As a workaround, you can create the folder, e.g.

$ mkdir -p /home/bagus/audb-host/data-local/

Or specify a custom ~/audb.yaml config file, containing:

cache_root: ~/audb
shared_cache_root: /data/audb
repositories:
  - name: data-public
    backend: artifactory
    host: https://audeering.jfrog.io/artifactory

hagenw commented 2 months ago

This should be fixed with version 1.7.0 of audb.

Could you please try:

$ pip install --upgrade audb

and try again.

bagustris commented 2 months ago

@hagenw

I still see the same error in v1.7.0. Using cough-speech-sneeze DB, it loads until 6% (load media) before the error happens.

In [1]: import audb

In [2]: audb.__version__
Out[2]: '1.7.0'
In [4]: db = audb.load('cough-speech-sneeze', format='wav', verbose=True)
Get:   cough-speech-sneeze v2.0.1
Cache: /home/bagus/audb/cough-speech-sneeze/2.0.1/5690b542
...
File ~/github/nkululeko/.env/lib/python3.8/site-packages/audbackend/core/utils.py:32, in call_function_on_backend(function, suppress_backend_errors, fallback_return_value, *args, **kwargs)
     30     return fallback_return_value
     31 else:
---> 32     raise BackendError(ex)

BackendError: An exception was raised by the backend, please see stack trace for further information.

Full log: https://pastebin.ubuntu.com/p/SyKXFK2mdR/

hagenw commented 2 months ago

Thanks for reporting again. Unfortunately, the error seems to be related with our public Artifactory instance which hosts the data. I'm able to reproduce it and created https://github.com/audeering/audb/issues/409 to track it as a separate issue.

I hope we are able to fix this in the near future or can switch to a better server.

As a workaround, you can simply rerun your download command and it will continue were it did stop. For me the download runs fine for around 8 minutes until the error is thrown. To speed things up, you might use several threads when downloading the data:

db = audb.load('cough-speech-sneeze', format='wav', verbose=True, num_workers=8)

bagustris commented 2 months ago

@hagenw

Thanks for confirming that you can reproduce it. It is also very slow downloading from that artifactory (comparing to wget, curl). For me, the fastest one is to host the dataset in audformat in Zenodo, they also has ability to multiple version. Then just download it instead of using audb (maybe audb could connect to Zenodo to bypass manual downloading dataset in audformat).

hagenw commented 2 months ago

Thanks for the suggestion, I was also thinking about Zenodo some time ago, but the problem is that audb itself is responsible for managing the different versions of a dataset, and can have dependencies to single files from other versions. This means we also need the possibility to publish data with audb, which seems not that easy with Zenodo.

At the moment our favored alternative would be just using a web server, where we can upload via FTP and download via HTTPS. In general, the performance of Artifactory server are ok, downloading from our internal one is very fast, but the public one, hosted by https://audeering.jfrog.io has caused us several problems already and is indeed not very fast.

hagenw commented 2 months ago

As mentioned in https://github.com/audeering/audb/issues/409#issuecomment-2109643598, the error during download seems also not to happen when using a previous version of audb and audbackend:

$ pip install "audb==1.6.5"
$ pip install "audbackend==1.0.2"
$ mkdir -p ~/audb-host/data-local/  # to avoid the error reported here at the very top of this issue

audeering / audb

Error on using `load` with `format` argument #389