Closed tcompa closed 1 year ago
Here are the typical ways we create tables. Let's see if/what we can abstract out of them.
# Create tables zarr group for ROI tables
group_tables = group_image.create_group("tables/") # noqa: F841
well_id = row + column
# Prepare AnnData tables for FOV/well ROIs
FOV_ROIs_table = prepare_FOV_ROI_table(site_metadata.loc[well_id])
well_ROIs_table = prepare_well_ROI_table(
site_metadata.loc[well_id]
)
# Write AnnData tables in the tables zarr group
write_elem(group_tables, "FOV_ROI_table", FOV_ROIs_table)
write_elem(group_tables, "well_ROI_table", well_ROIs_table)
group_tables.attrs["tables"] = ["FOV_ROI_table", "well_ROI_table"]
# Copy the tables in ROI_table_names
for ROI_table_name in ROI_table_names:
logger.info(
f"I will now read {ROI_table_name} from "
f"{zarrurl_old=}, convert it to 2D, and "
"write it back to the new zarr file."
)
ROI_table = ad.read_zarr(
f"{zarrurl_old}/{well_path}/{image_path}/"
f"tables/{ROI_table_name}"
)
# Convert 3D FOVs to 2D
if project_to_2D:
new_ROI_table = convert_ROIs_from_3D_to_2D(
ROI_table, pxl_size_z
)
# Write new table
write_elem(
new_tables_group, ROI_table_name, new_ROI_table
)
if output_ROI_table:
# Concatenate all ROI dataframes
df_well = pd.concat(bbox_dataframe_list, axis=0, ignore_index=True)
df_well.index = df_well.index.astype(str)
# Extract labels and drop them from df_well
labels = pd.DataFrame(df_well["label"].astype(str))
df_well.drop(labels=["label"], axis=1, inplace=True)
# Convert all to float (warning: some would be int, in principle)
bbox_dtype = np.float32
df_well = df_well.astype(bbox_dtype)
# Convert to anndata
bbox_table = ad.AnnData(df_well, dtype=bbox_dtype)
bbox_table.obs = labels
# Write to zarr group
group_tables = zarr.group(f"{in_path}/{component}/tables/")
write_elem(group_tables, output_ROI_table, bbox_table)
logger.info(
"Bounding box ROI table written to "
f"{in_path}/{component}/tables/{output_ROI_table}"
)
# Output handling: "dataframe" type (for each output, concatenate ROI
# dataframes, clean up, and store in a AnnData table on-disk)
for (name, out_params) in dataframe_outputs:
table_name = out_params.table_name
# Concatenate all FOV dataframes
list_dfs = output_dataframe_lists[name]
if len(list_dfs) == 0:
measurement_table = ad.AnnData()
else:
df_well = pd.concat(list_dfs, axis=0, ignore_index=True)
# Extract labels and drop them from df_well
labels = pd.DataFrame(df_well["label"].astype(str))
df_well.drop(labels=["label"], axis=1, inplace=True)
# Convert all to float (warning: some would be int, in principle)
measurement_dtype = np.float32
df_well = df_well.astype(measurement_dtype)
df_well.index = df_well.index.map(str)
# Convert to anndata
measurement_table = ad.AnnData(df_well, dtype=measurement_dtype)
measurement_table.obs = labels
# Write to zarr group
group_tables = zarr.group(f"{in_path}/{component}/tables/")
write_elem(group_tables, table_name, measurement_table)
# Update OME-NGFF metadata
current_tables = group_tables.attrs.asdict().get("tables") or []
if table_name in current_tables:
# FIXME: move this check to an earlier stage of the task
raise ValueError(
f"{in_path}/{component}/tables/ already includes "
f"{table_name=} in {current_tables=}"
)
new_tables = current_tables + [table_name]
group_tables.attrs["tables"] = new_tables
# FIXME: also include table metadata, see issue #333
Something that clearly can be moved into write_table
is the creation of the tables
group (if needed), and the check on whether the required subgroup already exists (to be handled via overwrite
).
EDIT: also the tables
metadata are always handled in the same way.
This means that the first version of this function could look like
def write_table(image_group, table_name, table, overwrite=False):
# Create tables group, if needed
...
# Check whether subgroup already exists, and proceed accordingly
...
# If it's all OK, proceed and write the table
...
# Update the `tables` metadata of the image group
...
Let's see if we can include more general features in this function.
The first prototype is available through https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/f4f127e59de78a8aa0367e9628af306b817dbb19, comments on the scope are welcome.
I like the approach to write_table
a lot, especially that handling all the metadata attributes now go into a wrapper, not the task function. Will make it much easier to be consistent on those.
I don't fully understand yet how all of them are set and whether we handle e.g. well_ROI_table correctly (see comments). But great that we'll have one place to handle that.
I improved the prototype function (including your suggestion) with 9e5b5e8 and 45b4d50fcb887d85d1eba21582a3d59742e0b88f, see current status of PR #499. Let me know if something is still unclear or can be improved.
More changes may come from actually integrating this function in the tasks.
Looks good to me! We can then work more on this wrapper when we formalize ROIs more (e.g. what metadata should our default ROI tables have?). It's already a great first step towards it :)
On the
write_elem
wrapper: can we take this opportunity to make this a broader wrapper that handles more of the table writing?write_elem
seems like a low level wrapper function. Would be great if tasks can just call awrite_table
or something like that with some parameters. Currently, many task replicate similar handling of the table metadata, updating the list of tables, checking for existing tables etc.This would be a good opportunity to generalize this and have a useful wrapper that sets the correct metadata for those tables, but that someone building a new task then doesn’t have to think about.
Originally posted by @jluethi in https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/485#issuecomment-1697036561