Closed lentinj closed 8 months ago
@bthe we talked about this a few months a go, I think it will be useful for my Math4Fish script output as a simpler way of grouping the output data, so I've got on and done it.
This should be done with the above, however a dplyr example would be much more intelligable:
ldist.lln.raw |> group_by(
year = year, age = age,
length = cut(length, breaks = seq(10, 100, by = 10), right = FALSE)
) |> summarise(number = sum(number))
To do this, we have to pull in dplyr
as a suggests, a bit of a sledgehammer but probably worth it.
Also:
What happens if you would do by_year = c(1981, 1999:2004, 2010) ? It would be cool if this would be wrapped into:
if(year >= 1981 && year < 1999)
par.1981
else if(year==1999)
par.1999
....
else if(year >=2004 && year < 2010)
par.2004
else
par.2010
As noted in https://github.com/gadget-framework/modelwizard/issues/6, taking factors of cuts is problematic, as they don't sort alphanumerically. We need to preserve the initial order when generating a factor
The main reason for the MFDB attributes is to specify the groupings that aren't represented in the data, i.e. you're trying to group lengths into
seq(0, 100, by=10)
, but there's only data for30..50
. In this case we should be comparing e.g.[60, 70)
to0
.The various MFDB aggregates will put this into attributes, but the more R way of doing this would be cut:
Parse factor strings as generated by
cut
() in likelihood_data, so we one can use dplyrgroup_by(cut(...))
instead of MFDB.NB: dbplyr doesn't return groupings as factors, so this won't work there.