fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Define table specifications V1 #529

Closed tcompa closed 10 months ago

tcompa commented 11 months ago
tcompa commented 10 months ago

Given the shape that #582 is taking, I wonder whether we should be more granular in the definition of versions. That is, should the zarr attributes of a table group be as follows?

// table with no additional specifications
{
 "fractal_table_version": "1",
}
// basic ROI table  (naming is WIP)
{
 "fractal_table_version": "1",
 "type": "basic_roi_table",
}
// advanced ROI table (naming is WIP)
{
 "fractal_table_version": "1",
 "type": "advanced_roi_table",
}
// feature table
{
 "fractal_table_version": "1",
 "type": "feature_table",
}

The benefit here is that based on a zarr attribute we immediately know whether some operations on the table make sense.

For instance we already use the type="ngff:region_table" information in a few places:

NOTE: if we start using our own table types (e.g. basic_roi_table, advanced_roi_table, feature_table -- all TBD), then we need to also support the legacy type ngff:region_table, to support tables of existing OME-Zarrs. My current understanding is that we used ngff:region_table simply as a placeholder for what would then become advanced_roi_table, and then we could easily maintain backwards compatibility by treating the two types as synonyms.

tcompa commented 10 months ago

Note: the proposal in the previous comment concerns the specs definition, but it's still acceptable to postpone a bit the actual implementation of spec-compliant tables. That is, we can now define (as part of #582) the V1 specs, and then refactor the internal tools and helper functions in the appropriate way in upcoming releases.


For instance we may start to encode part of the specs in the code, e.g. the required columns for a ROI tables could be stored in an upcoming lib_tables.py module.

And we could also take the opportunity to clean up the structure of the helper functions, to keep the ones referred to different parts of the specs separated. We can decide how far to push the modularity, but just to have a random example we could go as far as:

fractal_tasks_core
    tables
        v1
            lib_tables.py (which includes, e.g.
            lib_roi_tables.py
            lib_feature_tables.py
            lib_write_table.py

(or any other arrangement of the same kind of information.. e.g. the v1/v2 do not need to be specific subfolders)

jluethi commented 10 months ago

On the idea of having specific table types: I can see the motivation behind this and it does sound helpful. Given the fluid situation of NGFF table specification, I wouldn't want to set too many hard requirements from our side though. But I like that we'll have internal ways of calling (& validating) different table types!

As such, I think we shouldn't be too strict about these types for the time being. I don't mind having them and raising warnings when they aren't set. We may even have some processing that goes easier if the type is set. But we should (at least for quite a while) accept tables with no type set (/wrong type set or ngff:region_table type set) and have ways to check whether their content confirms to the table type we need.

The ngff:region_table type was meant to be our feature_table type. Different naming convention I guess. The concept of ROI tables hadn't made it into that spec proposal. We only started to adopt ngff:region_table to become closer to the spec in some places.


Regarding names, here is a proposal:

// table with no additional specifications
{
 "fractal_table_version": "1",
}
// basic ROI table
{
 "fractal_table_version": "1",
 "type": "roi_table",
}
// advanced ROI table (naming is WIP)
{
 "fractal_table_version": "1",
 "type": "masking_roi_table",
}
// feature table
{
 "fractal_table_version": "1",
 "type": "feature_table",
}

Are roi_tables just a subset of masking_roi_tables? The masking_roi_tables just contain additional info on labels, metadata on the label image and such, right? The only thing that makes the advanced ROI tables advanced is their masking info? Or are there other advanced info in there? I'm happy with feature_table & no-type tables