Parse cut-derived factors in likelihood data.frames

gadget-framework / gadget3

TMB-based gadget implemtation

GNU General Public License v2.0

8 stars 6 forks source link

Parse cut-derived factors in likelihood data.frames #112

Closed lentinj closed 8 months ago

lentinj commented 11 months ago

The main reason for the MFDB attributes is to specify the groupings that aren't represented in the data, i.e. you're trying to group lengths into seq(0, 100, by=10), but there's only data for 30..50. In this case we should be comparing e.g. [60, 70) to 0.

The various MFDB aggregates will put this into attributes, but the more R way of doing this would be cut:

> cut(c(10, 15, 25, 38), seq(10, 50, by = 10), right = FALSE)
[1] [10,20) [10,20) [20,30) [30,40)
Levels: [10,20) [20,30) [30,40) [40,50)

Parse factor strings as generated by cut() in likelihood_data, so we one can use dplyr group_by(cut(...)) instead of MFDB.

NB: dbplyr doesn't return groupings as factors, so this won't work there.

lentinj commented 11 months ago

@bthe we talked about this a few months a go, I think it will be useful for my Math4Fish script output as a simpler way of grouping the output data, so I've got on and done it.

lentinj commented 11 months ago

This should be done with the above, however a dplyr example would be much more intelligable:

ldist.lln.raw |> group_by(
  year = year, age = age,
  length = cut(length, breaks = seq(10, 100, by = 10), right = FALSE)
) |> summarise(number = sum(number))

To do this, we have to pull in dplyr as a suggests, a bit of a sledgehammer but probably worth it.

lentinj commented 9 months ago

Also:

What happens if you would do by_year = c(1981, 1999:2004, 2010) ? It would be cool if this would be wrapped into:

if(year >= 1981 && year < 1999) 
par.1981
else if(year==1999)
par.1999
....
else if(year >=2004 && year < 2010)
par.2004
else
par.2010

lentinj commented 8 months ago

As noted in https://github.com/gadget-framework/modelwizard/issues/6, taking factors of cuts is problematic, as they don't sort alphanumerically. We need to preserve the initial order when generating a factor