Optimize getting aggregated group/users data in API

We reviewed this with @mfranzon - see also #1745.

The bottom line is that we found some viable solutions, but we will only run full benchmarks when this becomes relevant in terms of performance. Notice that this is an admin-only endpoint, and that it will be some time before we hit large numbers of users and/or groups.

Option 1: `itertools.groupby`

See https://docs.python.org/3/library/itertools.html#itertools.groupby

Code like


    # Get all links, sorted by `group_id`
    stm_links = select(LinkUserGroup).order_by("group_id")
    res = await db.execute(stm_links)
    links = res.scalars().all()

    # Enrich group objects with `user_ids` attribute
    for ind, (group_id, group_elements_iterator) in enumerate(
        itertools.groupby(links, key=lambda _link: _link.group_id)
    ):
        if group_id != groups[ind].id:
            raise HTTPException(
                status_code=500,
                detail=(
                    f"Error while creating `user_ids` for {group_id=}, "
                    f"with {ind=} and {groups[ind]=}."
                ),
            )
        groups[ind] = dict(
            groups[ind].model_dump(),
            user_ids=[link.user_id for link in group_elements_iterator],
        )

Option 2: sql query

GROUP BY should be used together with an aggregation function. Possible options are e.g. the ones in https://docs.sqlalchemy.org/en/20/core/functions.html#selected-known-functions:

aggregate_strings works fine with sqlite, as in the snippet below
the same does not work in postgresql, because IDs are integers and not strings, but we could probably use array_agg

        from sqlalchemy import func
        from sqlalchemy.orm import join

        SEPARATOR = ","

        stm = (
            select(
                UserGroup,
                func.aggregate_strings(LinkUserGroup.user_id, SEPARATOR),
            )
            .select_from(
                join(
                    LinkUserGroup,
                    UserGroup,
                    LinkUserGroup.group_id == UserGroup.id,
                )
            )
            .group_by(LinkUserGroup.group_id)
            .order_by(UserGroup.id)
        )
        res = await db.execute(stm)
        enriched_groups = []
        for row in res.all(): # loop over groups
            group, user_ids_string = row[:]
            user_ids = [int(_id) for _id in user_ids_string.split(SEPARATOR)]
            enriched_groups.append(dict(group.model_dump(), user_ids=user_ids))
        return enriched_groups

fractal-analytics-platform / fractal-server

Optimize getting aggregated group/users data in API #1742

Option 1: `itertools.groupby`

Option 2: sql query

fractal-analytics-platform / fractal-server

Optimize getting aggregated group/users data in API #1742

Option 1: itertools.groupby

Option 2: sql query

Option 1: `itertools.groupby`