Closed sbfnk closed 1 month ago
Hmm. The naming did make sense to me at the time in the context of this flowchart:
But I do see your point.
Not sure how many people are using the gh version of scoringutils, but I suspect by now it might be a few. So changing names might not be completely free (but easier now than later - and we could even think about deprecation even though we tried to avoid this by breaking everything once).
What do others think?
@seabbs @nickreich @elray1
What do people think about this? We should aim to resolve this as soon as possible, as we have to do a CRAN update until October 31.
In terms of names I tend to agree with @sbfnk I think
I don't think our group has anything in place that would be deeply affected by a name change here.
Thanks @elray1. Do you have a preference either way?
Seb's point seems reasonable to me
I don't have a strong view either way and happy to drop get_
. Noting I think this would be effectively all uses aside from get_forecast_unit
and get_forecast_type
as I think everything else is doing some kind of compute and so following "but not to perform operations / calculations. With this in mind I would expect e.g." would be expected to not be referred to as a get.
Also noting that one of the benefits of get_
is that it avoids naming collisions as it acts as a package prefix here. I don't have good insights into how much of a problem that is here.
Renaming suggestions (i.e. visualising the proposal a bit)
get_duplicate_forecasts()
--> duplicate_forecasts()
get_pit()
--> pit_histogram()
(this is the data.table version, pit_histogram_sample()
is then the metric equivalent)get_coverage()
--> coverage()
get_correlations()
--> correlations()
get_pairwise_comparisons()
--> pairwise_comparisons()
get_forecast_counts()
--> forecast_counts
get_forecast_unit()
and get_forecast_type()
I guess make sense as they are. I'm slightly on the fence with get_duplicate_forecasts()
(function shows you duplicates that cause validation to fail).
Good point about the naming collisions
Also what I like about get_
is that you can start typing get_
and you'll see the suggestions in your IDE
which is really a benefit talk about for prefixes more generally (so one option is to pick a different prefix). I find that one a little weak though as what I usually do when exploring a package is do scoringutils::
and then look through the suggestions (as it works in the same way)
get_duplicate_forecasts()
-->duplicate_forecasts()
With this one I think we should probably call it find_duplicate_forecasts()
or similar, as it's not a function that duplicates forecasts.
Also noting a point from #949 that without a prefix it may be hard to distinguish these doing functions from metrics (which also don't have a clear prefix).
One way to get around this would be to make the metrics function operating on matrices/vectors internal (see https://github.com/epiforecasts/scoringutils/pull/949#issuecomment-2424969048).
Also noting a point from https://github.com/epiforecasts/scoringutils/pull/949 that without a prefix it may be hard to distinguish these doing functions from metrics (which also don't have a clear prefix).
I understood Sam's comment as a broader comment on all functions. We have some mixing at the moment as well, not sure how much of an issue this is going to be in reality. But I appreciate it's definitely not zero.
Ideally we would resolve this asap as about to try and write a workshop for scoringutils and ideally it won't all get cycle broken.
I don't have a strong view and happy to go with whatever. I would note though that this is another instance of a late revamp of already discussed choices which does feel inefficient so reflecting on how we can do better in initial dev conversations seems like a good idea.
which does feel efficient
inefficient you mean?
ha yes
How strongly do you feel, @sbfnk ? If we're all basically indifferent then I'd lean towards just leaving everything as is. If you do feel strongly (and I'm all in favour of strong feelings related to scoringutils!), then I'm happy to make the change tonight
If I haven't managed to convinced people that this matters then clearly it's just my own pet grievance, and I will cope.
If I haven't managed to convinced people that this matters
I think it's more a general scoringutils
-related decision paralysis :) I genuinely find it difficult at this point to put myself into an ordinary user's shoes.... (also let's not forget that I was the original mastermind behind naming choices like eval_forecasts()
and range_wide_to_long()
...)
But I'm genuinely happy to keep thinking and talking about this. Even at a later point we can always decide to just go through a proper deprecation cycle.
yes agree
yes agree
especially with the mastermind part, I suppose :-D
Sorry if I missed a previous discussion about this (which I suspect there has been) but I typically think of
get_
functions to provide access to some properties of an object (as is the case forget_forecast_unit()
) but not to perform operations / calculations. With this in mind I would expect e.g.get_coverage()
to be calledcoverage()
get_pit()
to be calledpit()
get_pairwise_comparison()
to be calledpairwise_comparison()
Otherwise shouldscore()
not be calledget_scores()
etc.?