Add custom functions for DEA calculations

Aeilert commented 7 months ago

Add custom functions for efficiency, super efficiency, slack and peer calcuations. This removes Benchmarking as a hard dependency.

The commit implies some minor changes in the slack table shown in the Analyse tab, but should otherwise not effect the UI.

Includes:

New function compute_efficiency()
New function compute_super_efficiency()
New function compute_slack()
New function get_peers()
Unit tests for the above
Two test datasets (frontier41 and Norwegian district courts)

Closes #80

ohjakobsen commented 6 months ago

Slack differs quite significantly from Benchmarking on some of the datasets I have available. Using the Electric_plants data frame from deaR, slack differs with the following parameters:

Inputs: Labor, Fuel, Capital Outpus: Output RTS: Variable Orientation: Input

Some firms differ with RTS set to NIRS, and they differ quite significantly with NDRS.

Interestingly, the pear package gives the same results as Benchmarking package, so the difference lies somewhere in the implementation in pioneeR.

ohjakobsen commented 6 months ago

Here is a reproducible example. The compute_slack() function in pioneeR (and in pear) will give the same results as Benchmarking, but the implementation within the app itself is flawed. It is possible to use model arguments in the slack function that differs from the original DEA model. This is what happens in the app, as the default values for CRS and input orientation will always be selected.

df <- deaR::Electric_plants

X <- cbind(df$Labor, df$Fuel, df$Capital)
Y <- as.matrix(df$Output)

vrs_pioneer <- pioneeR::compute_efficiency(X, Y, type = 'vrs', orientation = 'in', digits = NULL)
vrs_pear <- pear::compute_efficiency(X, Y, type = 'vrs', orientation = 'in', digits = NULL)
vrs_benchmarking <- Benchmarking::dea(X, Y, RTS = 'vrs', ORIENTATION = 'in')

slack_pioneer <- pioneeR::compute_slack(X, Y, vrs_pioneer$unadj_values, type = 'vrs', orientation = 'in', digits = NULL)
slack_pear <- pear::compute_slack(X, Y, vrs_pear$unadj_values, type = 'vrs', orientation = 'in', digits = NULL)
slack_benchmarking <- Benchmarking::slack(X, Y, vrs_benchmarking)

# Same results in pioneeR, pear and Benchmarking
slack_pioneer$values
slack_pear$values
slack_benchmarking$sum

Reproduce incorrect results in the current PR:

slack_pear2 <- pear::compute_slack(X, Y, vrs_pear$unadj_values, type = 'crs', orientation = 'in', digits = NULL)

# Gives wrong results as returns to scale is CRS
slack_pear2$values

Aeilert commented 6 months ago

@ohjakobsen : The specific bug in the UI should now be fixed.

ohjakobsen commented 6 months ago

@ohjakobsen : The specific bug in the UI should now be fixed.

This fixes the bug, but the implementation is still problematic, and these are exported functions that will be available to the user. I will do a full review and request changes to the implementation.

Aeilert commented 6 months ago

@ohjakobsen : The specific bug in the UI should now be fixed.

This fixes the bug, but the implementation is still problematic, and these are exported functions that will be available to the user. I will do a full review and request changes to the implementation.

Sounds good. I agree there is no explicit need to export these functions. I think the simplest solution is to use @keywords internal in documentation, but other options and code changes could also be valid.

Aeilert commented 6 months ago

Several changes should be done to this PR:

Reduce the number of exported functions for now. The user needs a function that returns the DEA model, slack and super efficiency (non exported functions should have @noRd)

The compute_efficiency() function should return an S3 object that has a custom print method (with a digits argument). This object can be passed to further functions

inst/app/R/fct_dea_boostrap.R must be updated as well (and change the rather embarrassing typo in the filename)

The code includes a lot of commented code that should be removed before we merge

Adhere to style guide for functions, in particular on the use of return

Hi @ohjakobsen,

Thanks for the review. As discussed offline, I have added several improvements and changes to this PR. The nature of the suggested changes required a more comprehensive approach. The good news is that the end-result is quite a lot better. :)

Important changes include:

I have added a new (exported) compute_dea() function. This is based on the one from {pear}, but includes several important changes. It now includes boolean parameters that can be used to specify whether or not to include super eff., scale eff., slack or peers in the computations/output. It also has a new object structure that is hopefully more suitable for a model object (it is sort of a compromise between a "data.frame" and a "model" object). I have added an s3-method for the response to use for printing and model summary of eff. scores.
See below for suggestions on how this can be used in the report.
compute_efficiency(), compute_super_efficiency(), compute_slack() and get_peers() have been converted to internal functions (with @noRd-tags). As mentioned these were written as internal functions and I always intended them to be so. The reason they where exported in the first commit was to follow the structure of create_matrix() and compute_scale_efficiency().
The digits parameter in compute_efficiency(), compute_super_efficiency() and compute_slack() has been removed.
compute_efficiency(), compute_super_efficiency() and compute_slack() now returns an info object as part of their response. This includes information on the model orientation, type and dimensions, and can easily be modified to include additional information if needed. I do however not think it is a good idea to include the model inputs (e.g. x, y). I also don't think there is any need to create a custom s3-object for this, since these are internal functions.
compute_efficiency() has gained a new argument; values_only = TRUE|FALSE. This can be used in application code when only the efficiency scores are needed as inputs to other functions. The typical use case here is when efficiency scores are needed to calculate productivity (Malmquist), scale efficiency or bootstrapped values. I don't think the name values_only is genius, so other suggestions are welcome. I'm also not sure about the overall need for this parameter, but large(r) return objects can in rare cases reduce performance (proved in testing), and this is an easy way around that issue.
compute_efficiency() and compute_super_efficiency() no longer have default arguments for type and orientation. This will make it harder for developers to accidentally introduce bugs like then one you found for slack.
compute_slack() no longer takes values, type and orientation as input parameters. Instead it relies on the result from compute_efficiency(), supplied as a model parameter. This is fairly similar to the approach in {Benchmarking}.
get_peers() has been modified to rely on the same model parameter, and not directly on the lambda values.
Unit tests for compute_efficiency(), compute_super_efficiency() and compute_slack() have been rewritten to not rely on "on-the-fly" calculations from {Benchmarking}. Instead the necessary values are pre-computed and stored as part of /testdata. This further reduces our reliance on {Benchmarking}, and makes it easier to remove this as a soft-dependency in the future (i.e. remove it from Suggests). The script to create the testing data is includes under /tests/scripts. This folder has been added to .Rbuildignore.
I have expanded the unit tests to included tests for two additional datasets (hospitals from {rDEA} and Electric_plants from {deaR}) to further ensure that our implementation of {Benchmarking} is correct. This means we have tests for three well-known and published DEA datasets, as well as our own district courts dataset.
I have added deprecation warnings to compute_matrix() and compute_scale_efficiency(). The first should be removed , ~~and the latter should be re-written to align with either compute_efficiency() (as an internal function) or compute_dea() (as an exported function) in the future~~. EDIT: Created a new issue for this instead (see #86).
I have added fixes for a few bugs where user supplied rounding was not reflected in the UI (i.e in server.R). See #84. This is out of scope for the main issue here, but in general I think it is good practice to fix "broken windows" when spotted.
Note also some dependency changes: Move {cli} to Imports as this is used in some of the utility functions. Add {lifecycle} to Imports for deprecation handling. Add {withr} to Suggests.

TO DO:

compute_dea() needs its own set of unit tests, but let's maybe agree on the functionality and structure first.
Update the report (for you to do as part of the review)

Suggestion to the report:

# Estimate the DEA model
model  <-  compute_dea(
  df,
  id = idvar,
  input = inputs
  output = outputs, 
  type = rts, 
  orientation = orientation,
  super = TRUE,
  slack = TRUE,
)

# Print out the summary of the DEA model
summary_tbl_dea(model)  

# Summary statistics for the efficiency scores
summary(model$results$efficiency)

# Get table for scale efficiency
compute_scale_efficiency(inputs, outputs, orientation, digits = 4L)

Aeilert commented 6 months ago

An additional comment on dep. warnings for old functions. I think it is correct to add a dep. warning for create_matrix() since it basically is no longer in use at all (when also removed from the report). But when running pioneeR::run_pioneer() this might look weird.

Its just every 8 hours though, so I think we can live with this

ohjakobsen commented 6 months ago

An additional comment on dep. warnings for old functions. I think it is correct to add a dep. warning for create_matrix() since it basically is no longer in use at all (when also removed from the report). But when running pioneeR::run_pioneer() this might look weird.

Its just every 8 hours though, so I think we can live with this

Actually, we should rethink this. create_matrix() was only included as a helper function for the 0.3.0. We should not confuse the user with irrelevant warnings that the user cannot do anything about. We should either postpone the deprecation, or make sure that the warning is not shown when the user runs run_pioneer().

Edit: I will remove the deprecation. It cannot be deprecated before we remove it from server.R, and it should not be removed from there yet. This PR already does too much, so will fix it later on.

ohjakobsen commented 6 months ago

Updated this PR with the following changes:

Remove deprecation warnings – we still need the functions
Rename argument type to rts for all dea and lp functions
Update analysis report with new functions
Remove internal scale efficiency function

Aeilert commented 6 months ago

Thanks @ohjakobsen

I reviewed the recent changes. They look good!

Individual commits look good.
Tests and build work fine locally for me (in addition to all checks in GHA being passed).
I did some testing of the UI and results appear to be the same as before. Ofc. easy to miss something here, but given the substantial unit tests and manual inspection from both of us I would say we are good to go on this one.
I created a set of new issues for some minor follow-ups. Lets keep doing that if we spot other inconsistencies or issues for improvement.

Riksrevisjonen / pioneeR

Add custom functions for DEA calculations #82