Open tcompa opened 1 week ago
We reviewed this with @mfranzon - see also #1745.
The bottom line is that we found some viable solutions, but we will only run full benchmarks when this becomes relevant in terms of performance. Notice that this is an admin-only endpoint, and that it will be some time before we hit large numbers of users and/or groups.
itertools.groupby
See https://docs.python.org/3/library/itertools.html#itertools.groupby
Code like
# Get all links, sorted by `group_id`
stm_links = select(LinkUserGroup).order_by("group_id")
res = await db.execute(stm_links)
links = res.scalars().all()
# Enrich group objects with `user_ids` attribute
for ind, (group_id, group_elements_iterator) in enumerate(
itertools.groupby(links, key=lambda _link: _link.group_id)
):
if group_id != groups[ind].id:
raise HTTPException(
status_code=500,
detail=(
f"Error while creating `user_ids` for {group_id=}, "
f"with {ind=} and {groups[ind]=}."
),
)
groups[ind] = dict(
groups[ind].model_dump(),
user_ids=[link.user_id for link in group_elements_iterator],
)
GROUP BY should be used together with an aggregation function. Possible options are e.g. the ones in https://docs.sqlalchemy.org/en/20/core/functions.html#selected-known-functions:
from sqlalchemy import func
from sqlalchemy.orm import join
SEPARATOR = ","
stm = (
select(
UserGroup,
func.aggregate_strings(LinkUserGroup.user_id, SEPARATOR),
)
.select_from(
join(
LinkUserGroup,
UserGroup,
LinkUserGroup.group_id == UserGroup.id,
)
)
.group_by(LinkUserGroup.group_id)
.order_by(UserGroup.id)
)
res = await db.execute(stm)
enriched_groups = []
for row in res.all(): # loop over groups
group, user_ids_string = row[:]
user_ids = [int(_id) for _id in user_ids_string.split(SEPARATOR)]
enriched_groups.append(dict(group.model_dump(), user_ids=user_ids))
return enriched_groups
Given the LinkUserGroup table, we sometimes have to aggregate its full content based on one of its two columns (either
group_id
oruser_id
). We currently do it "by hand", as inhttps://github.com/fractal-analytics-platform/fractal-server/blob/26a37067a5cbcb79d45f0540135ae17c8ef833aa/fractal_server/app/routes/auth/group.py#L29-L58
Can we implement this with groupby? See e.g. https://docs.sqlalchemy.org/en/20/core/selectable.html#sqlalchemy.sql.expression.Select.group_by