duncantl / RTypeInference

Tools for inferring the types of inputs and outputs for functions and expressions
14 stars 0 forks source link

use case #5

Open clarkfitzg opened 6 years ago

clarkfitzg commented 6 years ago

In the code below I would like to infer that npbin() produces a dataframe with columns: c("station", "right_end_occ", "mean_flow", "sd_flow", "number_observed") and column types c("integer", "numeric", "numeric", "numeric", "integer").

I can get the column names by analyzing the call to data.frame() that produces the output. But to get the column classes in general I think I need a full blown type inference system.

Below you can assume that the argument x will be a data frame with columns known ahead of time, and dyncut() outputs a numeric vector.

# Non parametric binned means
npbin = function(x)
{
    breaks = dyncut(x$occupancy2, pts_per_bin = 200)
    binned = cut(x$occupancy2, breaks, right = FALSE)
    groups = split(x$flow2, binned)

    out = data.frame(station = rep(x[1, "station"], length(groups))
        , right_end_occ = breaks[-1]
        , mean_flow = sapply(groups, mean)
        , sd_flow = sapply(groups, sd) 
        , number_observed = sapply(groups, length)
    )   
    out 
}

Here's the context for this code: https://github.com/clarkfitzg/pems_fd/blob/master/nonparametric/base_r.R

clarkfitzg commented 6 years ago

We can use this to actually try running it:

n = 10
dyncut = function(x, ...) c(-Inf, 0, Inf)
local_pems = data.frame(station = rep(1:3, length.out = n)
    , flow2 = rnorm(n), occupancy2 = rnorm(n))

npbin(local_pems)
duncantl commented 6 years ago

If you know the return type of mean, sd, length and cut() - which we do - this seems entirely knowable if you also know the types of the columns of x. In other words, you don't need a "full-blown type inference system". It will help, but you can get a long way pretty quickly.

clarkfitzg commented 6 years ago

For this particular use case it seems that we would have to:

  1. Know function return types, as Duncan said
  2. propagate known type information, ie from breaks and columns of x
  3. know indexing into a vector keeps the same type breaks[-1]
  4. know sapply(x, f) returns a vector of the same type as the return type of f

Can this package help then?