biigle / core

:large_blue_circle: Application core of BIIGLE
https://biigle.de
GNU General Public License v3.0
12 stars 16 forks source link

Rethink the data architecture #53

Closed mzur closed 2 months ago

mzur commented 7 years ago

I've already talked with Tim about this an he thought that the current data architecture of transects and projects is probably best but I'd like to raise this issue once again and formulate my thoughts.

new-project-volume-architecture

So we plan to rename transects into volumes because the meaning of "transect" is too specialized and does not reflect all cases of image collections that might occur in Biigle. Also, annotation sessions are defined on a per-volume basis because it is impossible to say for a user who belongs to multiple projects of a volume, which project is currently "active".

These two statements have one thing in common: A possibly false conception of the function of a project and a volume.

The term volume suggests a storage entity for a collection of images, like a directory in a file system. By itself the term has nothing to do with annotations or annotation sessions.

A project might be thought of as a way to conduct an annotation study, a "project" by multiple people (= project members) to annotate a defined set of objects of interests (= label trees) in different volumes.

In contrast to that description of a volume, currently a volume is a collection of images and image labels and annotations, which is independent from the projects it is attached to. All project members will see the same images, the same image labels and the same annotations, regardless of the project(s) the other users and creators of image labels/annotations belong to. So a volume currently is quite more than just a collection of images. In fact it is the most important entity in the whole data architecture. However, hierarchically and intuitively projects should be the most important entity.

Likewise annotation sessions intuitively should belong to a whole project instead of a volume. The "workaround" of annotation session users to prevent the session from affecting the work of other users (from other projects) is a symptom of this problem. Users who do the same kind of work already are defined as members of a common project. But the sharing of volumes with all annotations etc. between projects makes this and projects as a whole almost irrelevant. This might be a fundamentally flawed design decision.

The alternative

An updated data architecture might look like this: Volumes are degraded to simple collections of images, just like label trees are collections of labels and nothing more. Similar to label trees, volumes are independent from projects, can be publicly visible or private. All volumes have a set of users (admins) who can modify the name and description etc. or add and remove images. Images with annotations cannot be removed (ref). Private volumes can only be accessed by volume admins and have a set of authorized projects which are allowed to use them. All this is very similar to label trees; public volumes may be browsed and used by all Biigle users.

Admins of a project can attach any public or private volume (if the project is authorized). But other than now, annotations and image labels will then belong to the image of the volume and the single project. This means that volumes have one set of image labels and annotations for each project they are attached to. Different users from different projects can work on the same volume but only see their annotations, i.e. the annotations belonging to the project the user is a member of.

Implications:

mzur commented 7 years ago

It was suggested in the MIW 2017 that data should be made openly accessible.

It was agreed though that future marine imaging should be conducted with the aim of making data openly accessible.

This would play well with the idea of public projects mentioned in this issue.

mzur commented 6 years ago

Timm pointed out that one of the core ideas behind BIIGLE is to make annotations easily accessible. This means that a user should see annotations of other projects for the same image, even if they do not belong to the project.

This can still be done with the proposed architecture. Even though each project has its own set of annotations for an image, all of them can be displayed for all users. The proposed (more sane) permission system could be extended to the annotations. Even though the users can see the annotations of other projects (maybe displayed in a different style?), they cannot modify them if they do not have the right permissions for the project.

mzur commented 6 years ago

The reorganization was accepted. We agreed on the following changes (original are bold):

Maybe put this on hold until we have merged #109. Else we'd stack up too many changes and possible conflicts.

mzur commented 6 years ago

Some progress on this has been made in the new-da branch. At that time the changes were almost finished for biigle-core. Next would have been the modules. But since then there have been many new changes to core which would need to be merged. I have other priorities right now but maybe I'll get back to this later.

When I do: Implement ProjectVolumes as pivot models.

mzur commented 2 months ago

This will not be implemented any more. I still think this would be a better data architecture but the change would just be too much work. The clone volume feature can now be used to reuse the images (and even annotations) of a volume in another project. And with #824 we will have a better way to handle annotation sessions, too.