epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions
https://epiforecasts.io/scoringutils/
Other
48 stars 21 forks source link

`get_` function names #948

Closed sbfnk closed 1 month ago

sbfnk commented 1 month ago

Sorry if I missed a previous discussion about this (which I suspect there has been) but I typically think of get_ functions to provide access to some properties of an object (as is the case for get_forecast_unit()) but not to perform operations / calculations. With this in mind I would expect e.g.

nikosbosse commented 1 month ago

Hmm. The naming did make sense to me at the time in the context of this flowchart:

image

But I do see your point.

Not sure how many people are using the gh version of scoringutils, but I suspect by now it might be a few. So changing names might not be completely free (but easier now than later - and we could even think about deprecation even though we tried to avoid this by breaking everything once).

What do others think?

@seabbs @nickreich @elray1

nikosbosse commented 1 month ago

What do people think about this? We should aim to resolve this as soon as possible, as we have to do a CRAN update until October 31.

nikosbosse commented 1 month ago

In terms of names I tend to agree with @sbfnk I think

elray1 commented 1 month ago

I don't think our group has anything in place that would be deeply affected by a name change here.

nikosbosse commented 1 month ago

Thanks @elray1. Do you have a preference either way?

elray1 commented 1 month ago

Seb's point seems reasonable to me

seabbs commented 1 month ago

I don't have a strong view either way and happy to drop get_. Noting I think this would be effectively all uses aside from get_forecast_unit and get_forecast_type as I think everything else is doing some kind of compute and so following "but not to perform operations / calculations. With this in mind I would expect e.g." would be expected to not be referred to as a get.

Also noting that one of the benefits of get_ is that it avoids naming collisions as it acts as a package prefix here. I don't have good insights into how much of a problem that is here.

nikosbosse commented 1 month ago

Renaming suggestions (i.e. visualising the proposal a bit)

  1. get_duplicate_forecasts() --> duplicate_forecasts()
  2. get_pit() --> pit_histogram() (this is the data.table version, pit_histogram_sample() is then the metric equivalent)
  3. get_coverage() --> coverage()
  4. get_correlations() --> correlations()
  5. get_pairwise_comparisons() --> pairwise_comparisons()
  6. get_forecast_counts() --> forecast_counts

get_forecast_unit() and get_forecast_type() I guess make sense as they are. I'm slightly on the fence with get_duplicate_forecasts() (function shows you duplicates that cause validation to fail).

nikosbosse commented 1 month ago

Good point about the naming collisions

nikosbosse commented 1 month ago

Also what I like about get_ is that you can start typing get_ and you'll see the suggestions in your IDE

seabbs commented 1 month ago

which is really a benefit talk about for prefixes more generally (so one option is to pick a different prefix). I find that one a little weak though as what I usually do when exploring a package is do scoringutils:: and then look through the suggestions (as it works in the same way)

sbfnk commented 1 month ago
  1. get_duplicate_forecasts() --> duplicate_forecasts()

With this one I think we should probably call it find_duplicate_forecasts() or similar, as it's not a function that duplicates forecasts.

seabbs commented 1 month ago

Also noting a point from #949 that without a prefix it may be hard to distinguish these doing functions from metrics (which also don't have a clear prefix).

sbfnk commented 1 month ago

One way to get around this would be to make the metrics function operating on matrices/vectors internal (see https://github.com/epiforecasts/scoringutils/pull/949#issuecomment-2424969048).

nikosbosse commented 1 month ago

Also noting a point from https://github.com/epiforecasts/scoringutils/pull/949 that without a prefix it may be hard to distinguish these doing functions from metrics (which also don't have a clear prefix).

I understood Sam's comment as a broader comment on all functions. We have some mixing at the moment as well, not sure how much of an issue this is going to be in reality. But I appreciate it's definitely not zero.

seabbs commented 1 month ago

Ideally we would resolve this asap as about to try and write a workshop for scoringutils and ideally it won't all get cycle broken.

I don't have a strong view and happy to go with whatever. I would note though that this is another instance of a late revamp of already discussed choices which does feel inefficient so reflecting on how we can do better in initial dev conversations seems like a good idea.

nikosbosse commented 1 month ago

which does feel efficient

inefficient you mean?

seabbs commented 1 month ago

ha yes

nikosbosse commented 1 month ago

How strongly do you feel, @sbfnk ? If we're all basically indifferent then I'd lean towards just leaving everything as is. If you do feel strongly (and I'm all in favour of strong feelings related to scoringutils!), then I'm happy to make the change tonight

sbfnk commented 1 month ago

If I haven't managed to convinced people that this matters then clearly it's just my own pet grievance, and I will cope.

nikosbosse commented 1 month ago

If I haven't managed to convinced people that this matters

I think it's more a general scoringutils-related decision paralysis :) I genuinely find it difficult at this point to put myself into an ordinary user's shoes.... (also let's not forget that I was the original mastermind behind naming choices like eval_forecasts() and range_wide_to_long()...)

But I'm genuinely happy to keep thinking and talking about this. Even at a later point we can always decide to just go through a proper deprecation cycle.

seabbs commented 1 month ago

yes agree

nikosbosse commented 1 month ago

yes agree

especially with the mastermind part, I suppose :-D