Riksrevisjonen / pioneeR

R package for running a Shiny app for DEA analysis
https://riksrevisjonen.github.io/pioneeR/
GNU General Public License v3.0
6 stars 2 forks source link

Update bootstrap functions and export main function #93

Closed ohjakobsen closed 4 months ago

ohjakobsen commented 4 months ago

The bootstrap functions should be reworked. They were created as app internal functions, but are now accessible as internal functions within the pioneeR package. In addition, we should export a function to perform bootstrapping. This function should mimic the existing compute_dea() function in style and arguments. However, it's a question if we should already accept an S3-object of class pioneer_dea as the data object. From the S3-object we can extract model parameters and orientation in addition to the original efficiency scores which will eliminate the need to run a base DEA model, giving a (quite modest) performance boost. Thoughts on this @Aeilert?

Aeilert commented 4 months ago

Hi @ohjakobsen

Fully agree that we should add an exported function for the apps bootstrapping functionality. Let's connect offline and discuss the structure. I think bootstrap_dea(x = ) or bootstrap_efficiency(x = ) where x is pioneer_dea object could work well in this case. But I would like to review the logic together with how we think compute_dea(), compute_scale_efficiency() and a potential compute_malmquist() should work.

ohjakobsen commented 4 months ago

@Aeilert: I'm struggling a bit with the workflow for the bootstrap function. Ideally the user should be able to compute a DEA model and then perform a bootstrap on the model. The user could also opt to perform the bootstrap directly. That was the purpose of the data argument. However, the workflow for the user gets a bit confusing since the exported compute_dea() function does not return the inputs and outputs. If the user wants to fit a DEA model and then bootstrap, the user must provide a matrix for inputs and outputs:

# Default input oriented model with CRS
mod <- compute_dea(fare89, id = "Plant", input = c("Labor", "Fuel", "Capital"), outputs = "Output")
# The user could also use the create_matrix functions that are planned for possible deprecation
X <- cbind(fare89$Labor, fare89$Fuel, fare89$Capital)
Y <- as.matrix(fare89$Output)
# This is the new function under development
# With an object of type pioneer_data, we can infer RTS and orientation, but not x and y
boot <- bootstrap_dea(model, x = X, y = Y)

An alternative would be to require a full specification from the user where the bootstrap_dea() function takes the same arguments as the compute_dea() function:

# Default input oriented model with CRS
mod <- compute_dea(fare89, id = "Plant", input = c("Labor", "Fuel", "Capital"), outputs = "Output")
# And a new specification for the bootstrap
boot <- bootstrap_dea(fare89, id = "Plant", input = c("Labor", "Fuel", "Capital"), outputs = "Output")

This will lead to a slight performance hit since we need to first compute a DEA model and then run the bootstrap on this model. The second alternative is for the compute_dea() function to return both inputs and outputs in the pioneer_dea object. This can be achieved in several ways, with one alternative returning a data.frame with the ID variable, input and output variables, efficiency scores (and possible other columns when slack and/or super is set to TRUE), with attributes to the data.frame indicating the which columns are inputs and which are outputs. You have mentioned that there might be performance issues with returning inputs and outputs. What are the conditions where this will be an issue, and will it not become an issue if the user still needs to create new input and output objects outside of the model object? Also, we could alleviate the performance issue with the values_only argument to compute_dea() that will only return the efficiency scores.

Note that I hoped we had bought ourselves a bit more time by only exporting compute_dea() for the next release. However, we need to settle on a more long time solution if we want to export the bootstrap function. If we want to postpone the problem, we can opt to not export the bootstrap function for now (it is not exported in the current release), but this will probably have the consequence that the compute_dea() function will change in future releases and that code based on the current pioneer_dea object will break when we do introduce changes.