fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Define abstract `read_table` helper function #632

Open tcompa opened 9 months ago

tcompa commented 9 months ago

We now have a first version of write_table, which will be part of the upcoming v0.14.0. We should check whether also a read_table function may be useful. This could replace of lines like

    import anndata as ad

    # Load the ROI table and its metadata attributes
    ROI_table = ad.read_zarr(ROI_table_path)
    attrs = zarr.group(ROI_table_path).attrs
    MaskingROITableAttrs(**attrs.asdict())
    column_name = attrs["instance_key"]
    # Check that ROI_table.obs has the right column and extract label_value
    if column_name not in ROI_table.obs.columns:
        raise ValueError(
            'In _preprocess_input, "{column_name}" '
            f" missing in {ROI_table.obs.columns=}"
        )

with lines which could look like

    from fractal_tasks_core.tables import read_table
    # Load the ROI table and its metadata attributes
    table, attrs, column_names = read_table(path, options={"validate_attrs": True})
    column_name = attrs["instance_key"]
    # Check that ROI_table.obs has the right column and extract label_value
    if column_name not in column_names:
        raise ValueError(
            'In _preprocess_input, "{column_name}" '
            f" missing in {columns=}"
        )

This is partly relevant also for #629, since it would force us to think more about what attributes a table must have; e.g. do all V1 tables have an obs attribute with some specific contents? TBD

jluethi commented 9 months ago

Big fan of the idea!

For read_table(path, options={"validate_attrs": True}), I'd rather go with something like:

read_table(path, validate_attrs=True)

(with a potential default for validate_attrs)

Also, couldn't this part be part of the validation block?

  column_name = attrs["instance_key"]
  # Check that ROI_table.obs has the right column and extract label_value
  if column_name not in column_names:
      raise ValueError(
          'In _preprocess_input, "{column_name}" '
          f" missing in {columns=}"
      )
jluethi commented 9 months ago

For the future, something like:

table, attrs = read_table(path, validate_attrs=True)