Closed nalimilan closed 4 years ago
I did some research and managed to beat data.table in some groupby cases.
Key areas to work on are string hashing and better parallelization. There may be a way to merge the work in FastGroupBy.jl into the Dataframes by rewriting the grouping backend to Radixsort based for bit types.
See post https://www.codementor.io/zhuojiadai/an-empirical-study-of-group-by-strategies-in-julia-dagnosell
See code here https://github.com/xiaodaigh/FastGroupBy.jl
Interesting. Let's discuss this elsewhere as this issue was supposed to deal about APIs, not implementations.
A good solution to make by
faster would be to add an AbstractDataFrame
type which would encode column type information, and which would be used to pass sub-data frames to the user-provided function. That would allow specializing the function on the types of the inputs. I have done some experimentations by wrapping a NamedTuple
.
Bumping this. I have been playing around with aggregate
recently and found something that made my computer stall
df = DataFrame(rand(50_000, 6000));
df[:id] = rand(1:25_000, 50_000); # 2 observations per group, on average
aggregate(df, :id, mean) # takes a very long time
I am not sure how contrived an example this is, but it's one we are more or less running into with our work in Stata, and Stata isn't handling it that well either. I think the error is that the GroupedDataFrame
operation isn't using all the type information it could be using.
You can see an Any[]
there.
@nalimilan what is left from this issue to be done?
I think several points still need to be addressed, e.g. regarding aggregate
. I'd also like to check that the API is OK before closing this.
👍 Thank you!
Just leaving a note that it could make sense to replace aggregate(gd, f)
with something like combine(gd, All() => f)
or combine(gd, : => f)
, with All
or :
a special selection object similar to JuliaDB selectors. That would make the vocabulary much easier to grasp than using arbitrary terms like aggregate
(combine
and aggregate
mean the same thing). dplyr uses summarize_all
for this. This could be extended using other kinds of selectors like in JuliaDB.
The only question is what we also need a replacement for aggregate(df, f)
. I guess we could support combine(df, ...)
for all variants of combine
, making it work as if df
was the only group in a GroupedDataFrame
(and without grouping columns).
aggregate
for df
. We have mapcols
which is cleaner in this respect (does not cover all functionality, but for me everything above what mapcols
does is too much magic).With combine
I have a mental problem that =>
syntax is reserved for applying f
to what is on LHS
.
So maybe:
combine(gd, All(collection) => f)
, where collection
is a list of sets of columns to apply f
over to, and in particular it can be All()
which means all columnsaggregate
and combine
use different methods for this)I am in favor of removing
aggregate
fordf
. We havemapcols
which is cleaner in this respect (does not cover all functionality, but for me everything above whatmapcols
does is too much magic).
That's tempting, but both dplyr's summarize
and Pandas' aggregate
support data frames and grouped data frames, so I think we also need a function working on both.
With
combine
I have a mental problem that=>
syntax is reserved for applyingf
to what is onLHS
.
Good point, so indeed : => f
is not really correct. Conceptually that would be : .=> f
(along the lines of https://github.com/JuliaData/DataFrames.jl/pull/1620), which of course cannot work. combine(gd, All(collection) => f)
would be OK. Though a problem is that All(collection) => f
could also mean we want to pass all arguments in collection
to f
. That's the same thing as collection => f
for All
, but we could support other operations like Between
and Not
as in JuliaDB, in which case the meaning is less obvious. Of course it's not very likely that you would like f
to take a list of variables that you cannot list one by one, but would it be OK to not even support that use case?
so I think we also need a function working on both.
I am OK to do the way you want to go (I just wanted to hint that mapcols
does very similar thing)
By All(collection)
I meant something as All([:a, :b])
or All([[:a,:b], [:c,:d]])
(to pass two columns two times).
Now I actually see that it can be even simpler like All(:a, :b)
or All([:a, :b], [:c, :d])
and even (if we want to be extra fancy) All(targetcolname1=[:a, :b], targetcolname2=[:c, :d])
. And then still All()
could be a shorthand for all columns.
Edit by All(:a, :b)
I mean apply f
to column :a
and to column :b
producing two results.
The last step to complete to close this issue is to deprecate aggregate
. For GroupedDataFrame
, it can be replaced with combine(gdf, All() .=> f)
(thanks to https://github.com/JuliaData/DataAPI.jl/pull/10). But there's also a DataFrame
method, for which combine
isn't the best term. A more natural deprecation for it would be select(gdf, All() .=> f)
. We could imagine supporting combine
on DataFrame
too, treating them as a single group, but that can be added later if it turns out it's a useful generalization.
Two comments:
All()
in combine
should "drop" grouping columns, but if All
is given some arguments it should work normally (this means that All()
for DataFrame
and GroupedDataFrame
work consistently if we use the definition that All()
captures all non-grouping columns, as in DataFrame
there are no grouping columns)select
we should coordinate with JuliaDB.jl (as it works elementwise there AFAICT).@piever - regarding the second point. We need two kinds of transformation operations:
How do you handle this distinction in JuliaDB.jl (it seems that select
works rowwise, then the question is how do you handle transformations that take a whole vector and transform it?)
@piever This is my current conclusion from he discussion with @nalimilan what would be best for DataFrames.jl. Please comment what you think of it (point 1 is probably the most relevant to JuliaDB.jl):
select
would work on columns of a DataFrame
, and deprecating aggregate
in this casemap
for a DataFrame
would work rowwise (also note that if a function works on vectors you can always work on rows with it using broadcasting)combine
deprecate aggregate
for GroupedDataFrame
In this way map
and select
perform two different functions (and are naturally complementary). If they both work rowwise then they have an overlap of the functionality.
I think it would be great to settle this before DataFrames.jl 1.0 release (so hopefully soon).
select
would work on columns of aDataFrame
, and deprecatingaggregate
in this case
select
in IndexedTables is a very complicated beast, in that select(t, v::AbstractVector)
returns v
, select(t, i::Union{Int, Symbol, SpecialSelector})
selects columns, passing a column name and function maps the function row-wise and select(t, tup::Tuple)
returns a table with the various selected vectors for each element in the tuple (see here).
select
is often used in combination with other parts of the API, for example, if I do groupby(f, t, by, select = :a => f)
it first computes select(t, :a => f)
and then uses that as data for groupby
. Because of this, changing select
would be very breaking (a lot of API functions have a select
keyword that always calls the select
machinery).
Among others, transform
inherits the same API: one can do transform(t, :a => v)
(replace a column by a vector) or transform(t, :a => :b => f)
( replace a column by a function applied element-wise). When adding transform
to DataFrames
, do you plan on having it work column-wise like like select
or row-wise? In my mind, it's important that the two functions are consistent.
In JuliaDB working on the columns directly is IMO not as common as in DataFrames as it doesn't work well with the distributed case. The main function to do this are groupby
and summarize
(which is similar to DataFrames.aggregate
and is just syntactic sugar on groupby
). So actually, the usecase you have in mind for colwise select
is simply groupby
with by = ()
in JuliaDB.
I suspect it'd be nicer, just like there is reduce
and groupreduce
to have apply
and groupapply
(where apply
is DataFrames' colwise select
) or, even better, reduce(f, group(df, key))
, apply(f, group(df, key))
.
Would apply
, groupapply
(or apply(f, group(df))
) (for some name of apply
, maybe this is not the best one) be a sensible choice instead of the proposed select
and by
for DataFrames or is it too late in the game for this kind of changes? It'd be great if we could also fix the inconsistencies with the grouping terminology between DataFrames and JuliaDB.
I would add
map
for aDataFrame
would work rowwise (also note that if a function works on vectors you can always work on rows with it using broadcasting)
I'm not sure I'm particularly qualified to comment on this one. On the one hand, if DataFrames don't iterate rows (you need eachrow
) and their size is 2D, then I'm not sure why map
would work on rows, for consistency one should do map(f, eachrow(df))
I guess, esp. since otherwise map
would change the shape of the container. On the other hand, this feels useful in practice and it's odd to have filter
but not map
.
I would make
combine
deprecateaggregate
for GroupedDataFrame
Even though JuliaDB has the syntactic sugar summarize
, I think it's not a matter of having a new function but an appropriate "special selector". The All() .=> f
idea is brilliant, esp. since the one can have Number .=> mean
and things like that (to only choose numeric columns). As discussed above though, I'd rather call this apply
/groupapply
.
Thank you for the extensive and insightful comment. I will add two short notes (probably @nalimilan will have more to say on this):
apply
name is excellent (I hope @nalimilan agrees 😄); we could - for the time being leave select
and map
untouched (and leave the decisions what to do with them post 1.0 release) and now rename combine
to apply
and remove aggregate
(we have not considered reduce
as I think it is mostly relevant in distributed context so there is no high pressure to add it now to DataFrames.jl)apply
would work on a GroupedDataFrame
or a DataFrame
where in the latter case we would "virtually" treat it as a GroupedDataFrame
with one group and no grouping columnsType
to select groups in categorical
and I agree that it can be naturally used also here (I have not thought about it before)(we have not considered
reduce
as I think it is mostly relevant in distributed context so there is no high pressure to add it now to DataFrames.jl)
One more tiny comment: I think that DataFrames actually uses something like groupreduce
to optimize groupby
for a set of commonly used operations, see for example here. Instead, IndexedTables accepts that groupby(:a => sum, t, :b)
is slow and recommends instead groupreduce(:a => +, t, :b)
(using OnlineStats for things that are a bit more complicated, like mean
or higher moments). Still, I agree that this can be a post 1.0 thing.
Thanks. That's so hard! I think there are two different but related issues:
select
or a potential mutate
/transform
should operate row-wise or on whole vectorsaggregate
Actually the former issue has much deeper implications than the latter, so I've filed a separate issue to continue the discussion: https://github.com/JuliaData/DataFrames.jl/issues/1952.
The discussion about aggregate
is much less problematic: at worst, we can keep aggregate
, it doesn't harm much even though it is largely redundant with the more general API we want to introduce.
I must say I'm not really convinced by the proposal to use apply
instead of combine
: I find the name too general, that doesn't even imply that grouping is dropped; and it's the name of an operation in Pandas that is equivalent to our combine(f, df)
(see OP), which increases confusion. summarize
is a good term in most cases, but since we also allow returning multiple rows it's not perfect either. combine
is actually quite accurate, the only problem is that it doesn't make a lot of sense when there's a single DataFrame
: it's only OK if you think of it as a special case of a single group. So basically I'd say it's OK if we don't deprecate aggregate
. I don't have a great solution right now... :-)
Regarding the standardization of the API between DataFrames and JuliaDB, there are several issues:
groupby
in DataFrames has no equivalent in JuliaDBby
in DataFrames is groupby
in JuliaDBselect
and transform
, but currently we're consistent since DtaaFrames doesn't accept functions in select
, and transform
doesn't even exist yetI think it would be fine to add reduce
over GroupedDataFrame
in the future, and maybe even groupreduce
as a shorthand. But in exchange for this and for not defining select
and transform
with conflicting meanings, I would kindly ask JuliaDB to rename groupby
to something else (why not by
?), as this is an obvious clash in meanings (and our name is consistent with precedent in dplyr).
Sticking to DataFrames.jl issues 😄, then I guess we stay with aggregate
for now.
The discussion in #1952 can probably be concluded post 1.0.
So can we close this issue for now or there are some outstanding things to be discussed that are not covered in #1952?
I think it would be wise to check that we have a rough idea of where we want to go before 1.0, just in case it requires deprecating something (apart from aggregate
).
I agree, but what is the list of things to decide apart from https://github.com/JuliaData/DataFrames.jl/issues/1952 and the newly opened https://github.com/JuliaData/DataFrames.jl/issues/1953 (the latter does not have to go in by 1.0).
What I mean is that I would prefer to keep a list of atomic things to do for 1.0 with a milestone set for them, as then it is easier to track what is left to do.
I think it would be fine to add
reduce
overGroupedDataFrame
in the future, and maybe evengroupreduce
as a shorthand. But in exchange for this and for not definingselect
andtransform
with conflicting meanings, I would kindly ask JuliaDB to renamegroupby
to something else (why notby
?), as this is an obvious clash in meanings (and our name is consistent with precedent in dplyr).
Of course there needs to be consensus with the other JuliaDB maintainers, but, as far as I'm concerned, I agree that the name groupby
is unfortunate: it is inconsitent with DataFrames and with JuliaDB itself (groupreduce
is reduce
applied group-wise, groupby
is not by
applied group-wise).
To me, the only big thing that need deciding soonish to settle this is whether there is a good name for the equivalent of reduce
for a function that requires the whole vector and may give a vector or scalar ("colwise select
"). Basically, the function that's used internally so that :a => mean
applied to a DataFrame
gives the mean of column :a
. The confusion seems to be that DataFrames
has a concept of grouped data (which JuliaDB does not) and this function should merge groups.
The name combine
makes sense here, but less so for JuliaDB (in some sense grouping there is given by primary keys, so no need to explicitly combine, the grouping will just be kept in the result: maybe this is more similar to dplyr). It also reads a bit odd in the ungrouped case.
I had proposed apply
but it seems inadequate in the grouped case. Maybe apply(f, df; combine = true)
(not sure what is the best default for the keyword) could replace combine
and also be used as a "colwise operation dropping columns" (as select
is not ideal due to name clash with JuliaDB)? Or even apply(f, df; combine = true, keepcols = false)
to encompass combine
, colwise select
and colwise transform
? This would solve the colwise issue in #1952 and make combine
no longer necessary as far as I understand.
Personally, I like that apply(f, groupby(df, :a), combine = true)
is clearly the group-apply-combine strategy. On top of that, apply(f, groupby(df, :a), combine = true, keepcols = true)
would be a nice way to add columns computed as groupwise window functions to a DataFrame
(at the moment it's not super-easy as far as I understand).
by(df, cols, args...; kwargs...)
would then be exactly apply(df, groupby(df, cols), args...; kwargs...)
. In JuliaDB land it would make more sense to call this operation groupapply
(in analogy with groupreduce
), but I guess by
is also a fine name.
I'm afraid this is a very JuliaDB-esque proposal. :-)
DataFrame
objects don't store any grouping information, and GroupedDataFrame
objects are completely different from DataFrame
. So having a combine
keyword argument would be a bit weird IMHO (besides being type-unstable). Also, GroupedDataFrame
is a collection of SubDataFrame
s representing groups, so we already have map
to apply a function to each SubDataFrame
. Actually, the canonical split-apply-combine in DataFrames (for which by
is a shorthand) is groupby
-map
-combine
; we support combine
with functions/pairs because it's more efficient to avoid having map
create an intermediate GroupedDataFrame
before combining.
If we added a combine
keyword argument, that would only be for interoperability with JuliaDB, but not as an idiomatic DataFrames API AFAICT. But I guess that conversely, JuliaDB could add combine
for the same reason.
Also as I said I don't really like apply
as it doesn't indicate what it does: how does it differ from map
, which sounds like a synonym? We could as well extend mapcols
for this, at least it would be explicit that it operates over columns.
On top of that,
apply(f, groupby(df, :a), combine = true, keepcols = true)
would be a nice way to add columns computed as groupwise window functions to aDataFrame
(at the moment it's not super-easy as far as I understand).
What do you mean by "groupwise window functions"?
Also as I said I don't really like
apply
as it doesn't indicate what it does: how does it differ frommap
, which sounds like a synonym? We could as well extendmapcols
for this, at least it would be explicit that it operates over columns.
I'm not completely sold on mapcols
as it is different in the case with just one function, but I see that maybe we just need a good concept of grouped table in JuliaDB and then one would call map
on that, so by
is actually more of a groupmap
(at least in JuliaDB, as vectors are not flattened together by default). In which case I guess it would be fine to repurpose groubpby
in JuliaDB to do the grouping and then just call map
on it and have by(f, df, cols)
be map(f, groupby(df, cols))
.
What do you mean by "groupwise window functions"?
I simply meant things like by(iris, :Species, :SepalLengthRank => :SepalLength => ordinalrank)
, i.e. adding a new column of the same length of the old one in a by
while keeping other columns. But maybe my point does not hold, in that the keepcols = true
might as well be added to by
.
I'm not really sure mapcols
is the right operation for this. I just find apply
not explicit enough. Need to think about other possibilities.
I simply meant things like
by(iris, :Species, :SepalLengthRank => :SepalLength => ordinalrank)
, i.e. adding a new column of the same length of the old one in aby
while keeping other columns. But maybe my point does not hold, in that thekeepcols = true
might as well be added toby
.
by
and combine
keep grouping columns by default. Actually there's no way to avoid them, but there's a PR to add an argument: https://github.com/JuliaLang/julia/issues/30901
Comparing with other software, our API is quite similar to Pandas, which provides three different functions (see also this summary):
groupby
is similar to our function. However, the result allows accessing its "columns" via indexing, and calling a reduction function likemean
on them computes it for each group.
Let me add that pandas' groupby
is slightly different from DataFrames.jl as it returns an iterable of key-group pairs rather than iterable of groups, as in DataFrames.groupby
.
In [1]: import pandas
...:
...: df = pandas.DataFrame(
...: {
...: "a": [1, 1, 2, 2, 3, 3],
...: "b": [1, 2, 3, 4, 5, 6],
...: }
...: )
...:
...: for key, group in df.groupby("a"):
...: print("key =", key)
...: print("group =")
...: print(group)
...: print()
key = 1
group =
a b
0 1 1
1 1 2
key = 2
group =
a b
2 2 3
3 2 4
key = 3
group =
a b
4 3 5
5 3 6
I find this useful sometimes. Of course, with DataFrames.jl, I can pull out the key
from SubDataFrame
very easily if I know cols
passed to groupby
. But I feel it's worth considering to have some API equivalent to the iterable returned by pandas.DataFrame.groupby
, if there is nothing equivalent already.
(I don't know DataFrame.jl enough to say for sure there is no function that does this out-of-the-box. I hope my comment is not adding unnecessary noise.)
The Query.jl way of doing this is to have a key
function that returns the key of a given group. I assume that API would also work here, if key
was defined for SubDataFrame
?
if
key
was defined forSubDataFrame
Yes, that would be equivalent and nice to have.
@tkf do you mean functionality implemented in https://github.com/JuliaData/DataFrames.jl/pull/1908 (it should be merged in a few days and included in release v0.20)
CC @jlumpe
Yes, exactly that. Thanks!
The Query.jl way of doing this is to have a
key
function that returns the key of a given group. I assume that API would also work here, ifkey
was defined forSubDataFrame
?
I found Query.jl's performance on large group by problems to be slower. In fact, I would say much slower. As long as can keep the performance at a good level
@xiaodaigh Your work on extensive benchmarking of group-by is valuable to many people. But I believe this thread is about API, not implementation or performance. If you have a strong concern about Query.jl's key
accessor function-based approach (which I don't get why), maybe #1908 is the right place to discuss?
about API, not implementation or performance
I am just not sure if a certain API will lead to slower implementation e.g. Maybe an iteration based API will be slower?
Having a key
function (or something like it) is not tied to an iterator based implementation at all.
Additional to-do / to-decide before 1.0 release:
by
and combine
syntax with coming select
syntaxselect
and transform
should work on grouped data frameI am closing this as I do not see anything not covered elsewhere (or not fitting the current design). If you feel something is missing then please open a new issue (possibly giving a link to this one for past reference).
Our grouping API is currently lacking, as it does not allow performing most common operations in the most efficient way (see this discussion and this one), and because it is not very user-friendly for all cases.
groupby
is the essential building block which returns aGroupedDataFrame
object, i.e. a per-group view on the originalDataFrame
. Callingmap
on the resulting object gives aGroupApplied
object, which can then be turned into a standardDataFrame
usingcombine
.by
is a convenience wrapper around these operations. It supports any function taking aDataFrame
and returning either a single value, a vector or aDataFrame
, and combines the result into aDataFrame
, with as many rows per group as the function returned. It is very flexible, but also cumbersome (since you need to writedf[:col]
inside the function). It is also quite slow since we can make almost no assumptions about the kind of data which is going to be returned, and inference cannot really help since the input type is alwaysDataFrame
and does not include any information on the column types.aggregate
is a more specific function which applies an operations independently on each column. It is also quite flexible since the function can return either a single value or a vector (in which case one row is created for each entry). This makes it slower than it could be since knowing that the function returns a single value would allow for optimizations (allocating in advance and looping efficiently over each column). However, here inference could be used since the function is applied separately to each function. See https://github.com/JuliaData/DataFrames.jl/pull/1246.Comparing with other software, our API is quite similar to Pandas, which provides three different functions (see also this summary):
groupby
is similar to our function. However, the result allows accessing its "columns" via indexing, and calling a reduction function likemean
on them computes it for each group.apply
for general transformations. This is similar to ourby
.transform
returns a data frame of the same size as the original. The passed function is applied independently to each column, and must return either a vector of the same length as the original for each column, or a single value which is recycled/broadcasted. This is the equivalent of ouraggregate
except that the result is guaranteed to have as many rows as the input (while in ours the number of rows is arbitrary). In terms of performance, this restriction allows allocating the final data frame in advance since the number of rows is known, avoiding the need to store copies for all groups before concatenating them.aggregate
returns a data frame with one row per group. It's a more restrictive variant oftransform
that requires the passed function to return a single value. Ouraggregate
is a bit more general since it also accepts vector returns, which can come with a performance impact unless we can find a way to use inference to distinguish the two cases (see above).dplyr takes a quite different approach:
group_by
is similar to ourgroupby
. However, the grouped data frame it returns behaves mostly like a standard data frame (contrary to ourGroupedDataFrame
type) except for aggregation operations, so it does not always need to be combined back to a data frame immediately.mutate
applies to each group a function which can access any columns of a data frame, and returns a column with as many rows as the original (several such operations can be carried out at the same time by passing multiple arguments). It could be seen as a version ofby
with a restricted output type: since the number of rows must be equal to the original, operations simply add new columns to (a copy of) the original data frame. An essential difference is that a convenience syntax is provided so that one does not actually write a function accessing the data frame, but expressions likex = mean(col)
. In terms of performance, this means the list of columns that each operation needs is known, which allows compiling specialized code (similar to what DataFramesMeta does with@transform
). Finally, note thatmutate
returns a grouped data frame, but since its behavior is close to that of plain data frames, this has only limited consequences for usability (comparing to what would happen with our current implementation).summarise
is similar tomutate
, but for functions returning a single value. It can benefit from the same performance advantages asmutate
, but can be even more efficient since results can be saved directly into the finalDataFrame
without intermediate allocations.mutate_all
andsummarise_all
are variants of the two previous functions which take a function and apply it to each column (similar to ouraggregate
). For the first one, the function must return either a single value, or a vector of the same length as the input. For the second one, the function must return a single value. The result is respectively a grouped data frame or a data frame.Overall, our API is quite similar to Pandas, though we do not provide an equivalent of their
aggregate
for the simplest case where a function always returns a scalar. It is not clear whether we need it for performance, or whether we could use inference in ouraggregate
to automatically use a fast path when possible (see above).On the contrary, our API is quite different from dplyr, which provides convenient functions to create new columns based on per-group operations (aggregation or arbitrary transformations), using a syntax similar to what Query or DataFramesMeta provide. Of course our
by
allows this kind of operation, but it makes it hard or impossible to do this efficiently since inference and specialization are problematic (see above). This kind of problem can only be solved using macros in order to have access the the names of columns which need to be used (and specialized on). So it is probably better left to Query and DataFramesMeta. So I'd say we should concentrate on ensuring our limited Pandas-like API is consistent and efficient, and point to other packages in our documentation for use cases which cannot be efficiently handled through it.One big limitation we currently have compared with both Pandas and dplyr is that a simple case like computing a grouped mean for a single column is inconvenient and slow. Pandas allows accessing the pseudo-column of a grouped data frame and calling
mean()
on it. dplyr allows a similar operation by callingmutate(grouped_df, m = mean(col))
. On the contrary, we require either usingby
, which is slow and not so convenient, or usingaggregate
, which applies the operation to all columns (and fails when some are not numeric, which should be fixed).Ideas for possible improvements:
GroupedDataFrame
behave more like a plainDataFrame
when possible. In particular, using a more compact printing would make it more useful on its own (showing grouping columns like normal ones, maybe highlighted).GroupedDataFrame
object, for example by adding agetindex
method to select specific columns before callingcolwise
. Allowing to callmean
(or any other reduction) on the resulting object could be convenient, but that would only work for the special case when there is a single column, whilecolwise
is more general. We probably don't want to allow callingmean
on aDataFrame
to apply it column-wise.aggregate
.aggregate
silently skip (i.e. return as-is) columns for which the passed function does not apply (according to its signature). Pandas does this by default, but since this is potentially confusing, it could be enabled via an argument.colwise
return aDataFrame
so that when called on aGroupedDataFrame
it would preserve grouping columns (currentlycolwise
does not even report the names of columns to which numbers correspond).aggregate
more efficient by using inference (https://github.com/JuliaData/DataTables.jl/pull/65). Unfortunately, this sounds more difficult forby
, even though there may be room for improvement.