Closed bagustris closed 2 months ago
Thanks for reporting this. The problem is that in the default config, we also provide a repository on your local machine, see https://github.com/audeering/audb/blob/069cc042341ef244df6068651b518c439680b7b8/audb/core/etc/audb.yaml#L7-L9
But if that repository does not exists, it raises an error.
As a workaround, you can create the folder, e.g.
$ mkdir -p /home/bagus/audb-host/data-local/
Or specify a custom ~/audb.yaml
config file, containing:
cache_root: ~/audb
shared_cache_root: /data/audb
repositories:
- name: data-public
backend: artifactory
host: https://audeering.jfrog.io/artifactory
This should be fixed with version 1.7.0 of audb
.
Could you please try:
$ pip install --upgrade audb
and try again.
@hagenw
I still see the same error in v1.7.0. Using cough-speech-sneeze DB, it loads until 6% (load media) before the error happens.
In [1]: import audb
In [2]: audb.__version__
Out[2]: '1.7.0'
In [4]: db = audb.load('cough-speech-sneeze', format='wav', verbose=True)
Get: cough-speech-sneeze v2.0.1
Cache: /home/bagus/audb/cough-speech-sneeze/2.0.1/5690b542
...
File ~/github/nkululeko/.env/lib/python3.8/site-packages/audbackend/core/utils.py:32, in call_function_on_backend(function, suppress_backend_errors, fallback_return_value, *args, **kwargs)
30 return fallback_return_value
31 else:
---> 32 raise BackendError(ex)
BackendError: An exception was raised by the backend, please see stack trace for further information.
Thanks for reporting again. Unfortunately, the error seems to be related with our public Artifactory instance which hosts the data. I'm able to reproduce it and created https://github.com/audeering/audb/issues/409 to track it as a separate issue.
I hope we are able to fix this in the near future or can switch to a better server.
As a workaround, you can simply rerun your download command and it will continue were it did stop. For me the download runs fine for around 8 minutes until the error is thrown. To speed things up, you might use several threads when downloading the data:
db = audb.load('cough-speech-sneeze', format='wav', verbose=True, num_workers=8)
@hagenw
Thanks for confirming that you can reproduce it. It is also very slow downloading from that artifactory (comparing to wget
, curl
). For me, the fastest one is to host the dataset in audformat in Zenodo, they also has ability to multiple version. Then just download it instead of using audb (maybe audb could connect to Zenodo to bypass manual downloading dataset in audformat).
Thanks for the suggestion, I was also thinking about Zenodo some time ago, but the problem is that audb
itself is responsible for managing the different versions of a dataset, and can have dependencies to single files from other versions. This means we also need the possibility to publish data with audb
, which seems not that easy with Zenodo.
At the moment our favored alternative would be just using a web server, where we can upload via FTP and download via HTTPS. In general, the performance of Artifactory server are ok, downloading from our internal one is very fast, but the public one, hosted by https://audeering.jfrog.io has caused us several problems already and is indeed not very fast.
As mentioned in https://github.com/audeering/audb/issues/409#issuecomment-2109643598, the error during download seems also not to happen when using a previous version of audb
and audbackend
:
$ pip install "audb==1.6.5"
$ pip install "audbackend==1.0.2"
$ mkdir -p ~/audb-host/data-local/ # to avoid the error reported here at the very top of this issue
I tried to use
load
with format a argument but getting this (backend) errorI also tried it with
crema-d
and got the same error. Although the original dataset maybe is already inwav
format, this should not raises error, since the user want to ensure the correct audio format.