audeering / audb

Manage audio and video databases
https://audeering.github.io/audb/
Other
22 stars 1 forks source link

Downloading datasets from public servers fails after some time #409

Open hagenw opened 1 month ago

hagenw commented 1 month ago

This was first reported in https://github.com/audeering/audb/issues/389#issuecomment-2109056287

When downloading a dataset with anonymous access to Artifactory, the download fails after some time:

>>> import audb
>>> audb.__version__
'1.7.0'
>>> import audbackend
>>> audbackend.backend.Artifactory.get_authentication("https://artifactory.audeering.com/artifactory")
('anonymous', '')
>>> db = audb.load('cough-speech-sneeze', format='wav', verbose=True)
...
ConnectionError: HTTPSConnectionPool(host='audeering.jfrog.io', port=443): Max retries exceeded with url: /artifactory/api/storage/data-public/cough-speech-sneeze/media/42324
baf-fe27-7828-f7bb-cbdf688aa80a/2.0.1/42324baf-fe27-7828-f7bb-cbdf688aa80a-2.0.1.zip (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x750d9b3ea
ad0>: Failed to resolve 'audeering.jfrog.io' ([Errno -3] Temporary failure in name resolution)"))
...

When running the same download using my credentials for authentication with Artifactory the download fails after the same time, but with a different error message:

HTTPError: 403 Client Error: Forbidden for url: https://jfrog-prod-euc1-shared-frankfurt-main.s3.amazonaws.com/aol-a0jkltxoy0gz0/filestore/95/95ed792104abf9dd0acdccce9c958d38
838abffb?X-Artifactory-username=hwierstorf&X-Artifactory-repoType=local&X-Artifactory-repositoryKey=data-public&X-Artifactory-packageType=maven&X-Artifactory-artifactPath=cou
gh-speech-sneeze%2Fmedia%2F993cb076-fc28-c7c0-0ed8-255f17a1c064%2F2.0.1%2F993cb076-fc28-c7c0-0ed8-255f17a1c064-2.0.1.zip&X-Artifactory-projectKey=default&x-jf-traceId=5e89fb5
fd1cc876c&response-content-disposition=attachment%3Bfilename...

When downloading the same or a larger dataset from our internal Artifactory server, the download does not fail.

When restarting a failed download it will pick up, where it left and will finish the download, also using several workers will work in that case.

hagenw commented 1 month ago

@ChristianGeng any idea how we could track down what is wrong with the settings of the public Artifactory server, or if we need to change how we connect to it in order to avoid the error?

In general, it seems to me that we should try to get another solution for hosting public datasets.

hagenw commented 1 month ago

When re-running the same code, but using audb==1.6.5 and audbackend==1.0.2, the download does succeed. But it also takes 30 minutes, instead of 10 minutes. So the problem seems related to the changes we introduced in https://github.com/audeering/audbackend/pull/222, where we use a requests.Session object to authenticate only once.

ChristianGeng commented 1 month ago

When re-running the same code, but using audb==1.6.5 and audbackend==1.0.2, the download does succeed. But it also takes 30 minutes, instead of 10 minutes. So the problem seems related to the changes we introduced in audeering/audbackend#222, where we use a requests.Session object to authenticate only once.

Could it be an async thingy - with _close getting callled too early? requests.Session() is used with a context manager, and could it also be that the context manager insists on closing itself?

hagenw commented 1 month ago

I have no clue, the only thing I can report is that those problems do not happen with our internal Artifactory server. So it must be a mixture of the changes introduced to audbackend 2.0.0, e.g. requests.Session(), _close(), and how the Artifactory server at jfrog.io is configured.