Closed mzur closed 2 months ago
It was suggested in the MIW 2017 that data should be made openly accessible.
It was agreed though that future marine imaging should be conducted with the aim of making data openly accessible.
This would play well with the idea of public projects mentioned in this issue.
Timm pointed out that one of the core ideas behind BIIGLE is to make annotations easily accessible. This means that a user should see annotations of other projects for the same image, even if they do not belong to the project.
This can still be done with the proposed architecture. Even though each project has its own set of annotations for an image, all of them can be displayed for all users. The proposed (more sane) permission system could be extended to the annotations. Even though the users can see the annotations of other projects (maybe displayed in a different style?), they cannot modify them if they do not have the right permissions for the project.
The reorganization was accepted. We agreed on the following changes (original are bold):
Maybe put this on hold until we have merged #109. Else we'd stack up too many changes and possible conflicts.
Some progress on this has been made in the new-da
branch. At that time the changes were almost finished for biigle-core. Next would have been the modules. But since then there have been many new changes to core which would need to be merged. I have other priorities right now but maybe I'll get back to this later.
When I do: Implement ProjectVolumes as pivot models.
This will not be implemented any more. I still think this would be a better data architecture but the change would just be too much work. The clone volume feature can now be used to reuse the images (and even annotations) of a volume in another project. And with #824 we will have a better way to handle annotation sessions, too.
I've already talked with Tim about this an he thought that the current data architecture of transects and projects is probably best but I'd like to raise this issue once again and formulate my thoughts.
So we plan to rename transects into volumes because the meaning of "transect" is too specialized and does not reflect all cases of image collections that might occur in Biigle. Also, annotation sessions are defined on a per-volume basis because it is impossible to say for a user who belongs to multiple projects of a volume, which project is currently "active".
These two statements have one thing in common: A possibly false conception of the function of a project and a volume.
The term volume suggests a storage entity for a collection of images, like a directory in a file system. By itself the term has nothing to do with annotations or annotation sessions.
A project might be thought of as a way to conduct an annotation study, a "project" by multiple people (= project members) to annotate a defined set of objects of interests (= label trees) in different volumes.
In contrast to that description of a volume, currently a volume is a collection of images and image labels and annotations, which is independent from the projects it is attached to. All project members will see the same images, the same image labels and the same annotations, regardless of the project(s) the other users and creators of image labels/annotations belong to. So a volume currently is quite more than just a collection of images. In fact it is the most important entity in the whole data architecture. However, hierarchically and intuitively projects should be the most important entity.
Likewise annotation sessions intuitively should belong to a whole project instead of a volume. The "workaround" of annotation session users to prevent the session from affecting the work of other users (from other projects) is a symptom of this problem. Users who do the same kind of work already are defined as members of a common project. But the sharing of volumes with all annotations etc. between projects makes this and projects as a whole almost irrelevant. This might be a fundamentally flawed design decision.
The alternative
An updated data architecture might look like this: Volumes are degraded to simple collections of images, just like label trees are collections of labels and nothing more. Similar to label trees, volumes are independent from projects, can be publicly visible or private. All volumes have a set of users (admins) who can modify the name and description etc. or add and remove images. Images with annotations cannot be removed (ref). Private volumes can only be accessed by volume admins and have a set of authorized projects which are allowed to use them. All this is very similar to label trees; public volumes may be browsed and used by all Biigle users.
Admins of a project can attach any public or private volume (if the project is authorized). But other than now, annotations and image labels will then belong to the image of the volume and the single project. This means that volumes have one set of image labels and annotations for each project they are attached to. Different users from different projects can work on the same volume but only see their annotations, i.e. the annotations belonging to the project the user is a member of.
Implications:
Annotation sessions will be defined per-project and not per-volume because the "path" a user takes to access images and create annotations can no longer point to multiple projects.
Annotation sessions no longer require the ability to restrict the session to a subset of users because this subset can be defined as the members of a new project.
The complexity of our authorization policies will be massively reduced because annotations, images, image labels, annotaion sessions etc. always belong to a unique project that defines the access permissions (in contrast to multiple projects like it is now).
Projects can use just a subset of the images of a volume because the whole volume is independent from the projects. This will give us BiodataMiningGroup/dias-projects#9 for free.
There can be project settings like the label confidence opt-in actually defined on the project and not on the volume. This is really important because otherwise the volume settings might conflict with the work of users of other projects and we are again forced to do a "restrict this only to these users" workaround.
Laserpoint detection results are shared between projects because they are fixed for each image. Is this desirable? There may be conflicts because users of other projects may affect the own (area) reports by changing the laserpoint distance.
Projects can be locked/finished if an annotation study is over. All users will then have read only access to the project and the annotations/image labels are frozen (for publication).
In the long run, projects could be made public themselves, allowing all users (think "citizens") to access them because they cannot break or affect anything important of other projects.
Probably some more things I haven't thought of yet...