audeering / audbackend

Manage file storage on different backends
https://audeering.github.io/audbackend/
Other
3 stars 0 forks source link

Better authentication handling in backends #215

Closed hagenw closed 2 months ago

hagenw commented 2 months ago

Closes #161, #162


* Add class method audbackend.backend.Artifactory.get_authentication()

This tackles #161 by providing a public class method to read authentication tokens from a config file or environment variables.

image


* Add authentication argument and attribute to audbackend.backend.Base

This tackles #162 by adding an authentication argument to __init__(), create(), and delete() of audbackend.backend.Base.

In audbackend.backend.FileSystem I then do not add authentication as an argument to __init__(). Don't know if it would be better to always have it, but from a user perspective I found it easier this way.


* Speedup audbackend.backend.Artifactory

audbackend.backend.Artifactory now only applies the authentication when calling open(). Before it did the authentication every time a path was requested on the backend, e.g. every time backend.exists(path) was called. Now, we store the path to the repo in self._repo, when calling open(), and can create all other paths based on top of that.

When asking if 4 files exists on the backend this reduced the execution time for me from 0.872 s to 0.178 s.

I also updated the code of the interface test fixture in tests/conftest.py to directly delete the repository on an Artifactory backend at the end of the test without the need to first re-authenticate.

Benchmark code ```python import time import audbackend backend = audbackend.backend.Artifactory("https://audeering.jfrog.io/artifactory", "data-public") backend.open() interface = audbackend.interface.Maven(backend) t0 = time.time() interface.exists("/emodb", "1.4.1") interface.exists("/emodb/db", "1.4.1") interface.exists("/emodb/meta", "1.4.1") interface.exists("/emodb/media", "1.4.1") t = time.time() print(f"{t - t0:.3f} s") ```

image

* Add audbackend.backend.Artifactory.path()

Before we had audbackend.backend.Artifactory._path(), which was returning an artifactory.ArtifactoryPath object that can be used to interact with the path on the backend. As this might also be helpful for a user (compare https://github.com/audeering/audb/pull/386), I decided to change it into a public method.

image

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.0%. Comparing base (215a098) to head (a38aba1).

Additional details and impacted files | [Files](https://app.codecov.io/gh/audeering/audbackend/pull/215?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=audeering) | Coverage Δ | | |---|---|---| | [audbackend/core/api.py](https://app.codecov.io/gh/audeering/audbackend/pull/215?src=pr&el=tree&filepath=audbackend%2Fcore%2Fapi.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=audeering#diff-YXVkYmFja2VuZC9jb3JlL2FwaS5weQ==) | `100.0% <100.0%> (ø)` | | | [audbackend/core/backend/artifactory.py](https://app.codecov.io/gh/audeering/audbackend/pull/215?src=pr&el=tree&filepath=audbackend%2Fcore%2Fbackend%2Fartifactory.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=audeering#diff-YXVkYmFja2VuZC9jb3JlL2JhY2tlbmQvYXJ0aWZhY3RvcnkucHk=) | `100.0% <100.0%> (ø)` | | | [audbackend/core/backend/base.py](https://app.codecov.io/gh/audeering/audbackend/pull/215?src=pr&el=tree&filepath=audbackend%2Fcore%2Fbackend%2Fbase.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=audeering#diff-YXVkYmFja2VuZC9jb3JlL2JhY2tlbmQvYmFzZS5weQ==) | `100.0% <100.0%> (ø)` | |
hagenw commented 2 months ago

I converted back to draft, as I would first like to solve one remaining problems with the tests.

hagenw commented 2 months ago

I was able to fix the test related problems, and the pull request is again ready for review.

hagenw commented 2 months ago

We could also speed up the tests on the Artifactory backend even further by just open one connection to the server and reuse it, but I would propose to not handle this in this pull request and created #218 to track it.

frankenjoe commented 2 months ago

audbackend.backend.Artifactory now only applies the authentication when calling open(). Before it did the authentication every time a path was requested on the backend, e.g. every time backend.exists(path) was called. Now, we store the path to the repo in self._repo, when calling open(), and can create all other paths based on top of that.

When asking if 4 files exists on the backend this reduced the execution time for me from 0.872 s to 0.178 s.

This is a great improvement and should lead to a significant speed up in audb when publishing / loading databases with many small files.

hagenw commented 2 months ago

Yes, this would also be my hope, maybe I should try to benchmark that already.

frankenjoe commented 2 months ago

In audbackend.backend.FileSystem I then do not add auth as an argument to __init__(). Don't know if it would be better to always have it, but from a user perspective I found it easier this way.

Seems ok to me, to only add it to classes where it is actually needed.