audeering / audbackend

Manage file storage on different backends
https://audeering.github.io/audbackend/
Other
3 stars 0 forks source link

Use backend without versioning #155

Closed frankenjoe closed 8 months ago

frankenjoe commented 10 months ago

Though the main motivation to introduce this packages was versioning. There might the use case where you don't care about versioning and want to use it like a regular file system. At the moment this is not possible since we enforce versioning. I wonder if it would make sense to support both?

hagenw commented 10 months ago

I'm afraid this will not be so easy to achieve.

You could try to hide version from the user by making it a keyword argument with the default value version='1.0.0'. If you also want to avoid that the version is visible on the actual backend (in how the files are stored), I guess you would need additional backend classes.

frankenjoe commented 10 months ago

Yes, at the moment what I did was creating a new project, where I removed all the version specific handling. The result was a simplified code, that still looks very similar to the original one, except that a file /sub/file.txt is directly stored under /sub/file.txt and not translated to something like sub/1.0.0/file.txt.

In case we wanted to support both, we could implement it by introducing a new base class that does not version and which all backends derive from. Our current base class Backend class would then derive from it, but before calling the functions of the base class transform /sub/file.txt to /sub/<version>/file.txt. As a side-effect this would also simplify the implementation of a backend, since the backend implementation does not have to deal with versioning at all as this is only introduced by an intermediate layer.

hagenw commented 10 months ago

Sounds good to me, especially:

As a side-effect this would also simplify the implementation of a backend, since the backend implementation does not have to deal with versioning at all

frankenjoe commented 8 months ago

As pointed out in https://github.com/audeering/audbackend/issues/166#issuecomment-1899049163, we cannot simply allow version=None, unless we restrict version to follow a certain format. Another solution is to introduce two interfaces - one with versioning and one without. I started with an implementation of this solution and here's a first usage example:

By default we use a versioned backend, which behaves exactly as our current backend:

import tempfile

import audeer

import audbackend

with tempfile.TemporaryDirectory(dir='.') as tmp:

    backend = audbackend.create('file-system', tmp, 'repo')
    file = audeer.touch(tmp, '~')

    backend.put_file(file, '/file.txt', '1.0.0')
    backend.put_file(file, '/sub/file.txt', '1.0.0')
    backend.put_file(file, '/sub/file.txt', '2.0.0')

    print(backend.ls())
    print(backend.ls('/file.txt'))
    print(backend.versions('/file.txt'))
    print(backend.ls('/sub/file.txt'))
    print(backend.versions('/sub/file.txt'))
    print(backend.latest_version('/sub/file.txt'))
[('/file.txt', '1.0.0'), ('/sub/file.txt', '1.0.0'), ('/sub/file.txt', '2.0.0')]
[('/file.txt', '1.0.0')]
['1.0.0']
[('/sub/file.txt', '1.0.0'), ('/sub/file.txt', '2.0.0')]
['1.0.0', '2.0.0']
2.0.0

But if we are not interested in versioning, we can do:

with tempfile.TemporaryDirectory(dir='.') as tmp:

    backend = audbackend.create('file-system', tmp, 'repo', versioned=False)
    file = audeer.touch(tmp, '~')

    backend.put_file(file, '/file.txt')
    backend.put_file(file, '/sub/file.txt')

    print(backend.ls())
    print(backend.ls('/file.txt'))
    print(backend.ls('/sub/file.txt'))
['/file.txt', '/sub/file.txt']
['/file.txt']
['/sub/file.txt']

In that case, the interface does not expose a version argument and some functions like versions() are not available.

Yet, both interfaces work on the same repository:

with tempfile.TemporaryDirectory(dir='.') as tmp:

    backend = audbackend.create('file-system', tmp, 'repo')
    file = audeer.touch(tmp, '~')

    backend.put_file(file, '/file.txt', '1.0.0')
    backend.put_file(file, '/sub/file.txt', '1.0.0')
    backend.put_file(file, '/sub/file.txt', '2.0.0')

    backend = audbackend.access('file-system', tmp, 'repo', versioned=False)

    print(backend.ls())
['/1.0.0/file.txt', '/sub/1.0.0/file.txt', '/sub/2.0.0/file.txt']

and

with tempfile.TemporaryDirectory(dir='.') as tmp:

    backend = audbackend.create('file-system', tmp, 'repo', versioned=False)
    file = audeer.touch(tmp, '~')

    backend.put_file(file, '/file.txt')
    backend.put_file(file, '/sub/file.txt')

    backend = audbackend.access('file-system', tmp, 'repo')

    print(backend.ls())
[('/file.txt', 'sub')]

(note that /file.txt is not returned, because we cannot determine a version for it)

One disadvantage of using a single create() and access() function is that intellisense cannot know which interface is returned. So maybe it would be better to have two functions like create_versioned() and create_unversioned() instead.

hagenw commented 8 months ago

Looks good to me.

cannot know which interface is returned

Did you had to implement to backends (e.g. FileSystem, VersionedFileSystem)? If yes, that would indeed not be good. The best solution would be if the single backends don't need to do anything and versioning is just handled inside the abstract Backend. It should also be possible to change the type of backend on the fly, e.g. Backend.versioned = True as we do with Backend.._legacy_file_structure, and one might add a check Backend.is_versioned.

frankenjoe commented 8 months ago

Did you had to implement to backends

No, there is a single abstract class Backend class that needs to be implemented once for every backend type. But we now provide two ways to interact with it - one with and one without versioning. But when use a single create(...) / access(...) function, we need to use typing.Union[VersionedBackend, UnversionedBackend] as return type and therefore the intellisense cannot know which of the two classes is returned and therefore auto-completion will e.g. always suggest the method versions() even though this function is not provided by UnversionedBackend. For access I would simply suggest we split it into two functions access_versioned() and access_unversioned(). For create(), however, this could be misleading when we provide create_versioned() and create_unversioned(), since the repository we create is exactly the same. One solution would be to not return a backend object with create(), but only create the repository and that the user always has to call access_*() to get a backend object.

hagenw commented 8 months ago

I would try to avoid breaking the API again and not renaming access() and create(). If the only problem is an auto-completion bug by an external program, I would not bother about that. For me it would also be fine to implement everything in the same backend class. When we switch to unversioned, then the version arguments of the methods do nothing and the versions() method always returns an empty list. We can also achieve this withoput breaking the current API, by just setting version=None in the current methods. If we use the versioned version of the backend, you will then get an error if version is set to None.

frankenjoe commented 8 months ago

Ok, if we do not care about auto-completion then I will continue with the approach as demonstrated in https://github.com/audeering/audbackend/issues/155#issuecomment-1899152218.

Btw: a developer can still do the following to guide intellisense:

backend = audbackend.create('file-system', tmp, 'repo', versioned=False)
assert isinstance(backend, audbackend.UnversionedBackend)
# only methods implemented by UnversionedBackend are available