fslaborg / RProvider

Access R packages from F#
http://fslab.org/RProvider/
Other
235 stars 69 forks source link

Passing input to R.glmboost #135

Open tinoswe opened 9 years ago

tinoswe commented 9 years ago

Hi all, I can't find the right way of passing arguments to the R.glmboost algorithm in F#. I attach a complete working example that can be run from an .fsx shell for those who have some minutes to look into it.

The call to R.glmboost is done within the body of the "r_ml" function (at the end) that is a function that takes y, x1, x2, x3, x4, x5 vectors as inputs. Basically I am building a model of y as a function of x1, x2, x3, x4, and x5 using the R.glmboost algorithm.

Note that the call to R.lm (one line above the call to R.glmboost, it is commented out in the script below) works fine. As a reference for glmboost input parameters you can refer to this: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=mboost/man/glmboost.Rd&d=R_CC The error I get has to do with a missing "x" argument that is not expected when using the "S3 method for class 'formula'" that is the one I want to use (see the "Usage" section of the link above, top of the page). However the "x" argument must be there when using the "method for class matrix".

It seems to me I am passing args the wrong way and I need some help on this. Thanks.


#r @"packages\R.NET.Community.1.5.15\lib\net40\RDotNet.dll"
#r @"packages\R.NET.Community.1.5.15\lib\net40\RDotNet.NativeLibrary.dll"
#r @"packages\R.NET.Community.FSharp.0.1.8\lib\net40\RDotNet.FSharp.dll"
#r @"packages\RProvider.1.0.13\lib\net40\RProvider.dll"
#r @"packages\RProvider.1.0.13\lib\net40\RProvider.Runtime.dll"
#r @"packages\RProvider.1.0.13\lib\net40\RProvider.DesignTime.dll"

open RDotNet
open RProvider
open RProvider.``base``
open RProvider.stats 
open System
open System.Collections.Generic
open System.Data
open System.Windows.Forms
open System.Drawing
open RProvider.mboost

#I "packages/FSharp.Data.2.0.9/lib/net40"
#r "FSharp.Data.dll"
open FSharp.Data

let y = [|1391.47; 1398.31; 1319.65; 1385.41; 1376.9; 1175.89; 1191.41; 1198.86;
    1209.61; 1197.23; 1328.33; 1348.88; 1355.42; 1346.91; 1362.67; 1197.19;
    1178.95; 1173.32; 1175.28; 1177.33; 1358.06; 1365.61; 1382.16; 1375.94;
    1375.98; 1177.01; 1187.15; 1182.75; 1170.6; 1357.31; 1336.09; 1276.13;
    1232.96; 1176.75; 1181.46; 1194.49; 1190.19; 1176.66; 1220.65; 1212.49;
    1200.88; 1186.1; 1187.23; 1165.8; 1171.97; 1184.53; 1190.76; 1191.46;
    1194.18; 1203.51; 1210.83; 1182.5; 1184.07; 1177.63; 1178.29; 1166.06;
    1202.71; 1203.52; 1197.53; 1196.07; 1169.11; 1137.97; 1122.29; 1105.01;
    1100.37; 1104.17; 1106.84; 1108.37; 1111.08; 1105.7; 1104.42; 1110.01;
    1104.13; 1110.89; 1107.5; 1110.61; 1104.2; 1097.01; 1096.82; 1101.0;
    1097.09; 1097.28; 1099.19; 1111.55; 1110.78; 1120.52; 1125.91; 1118.66;
    1113.57; 1117.54; 1109.09; 1098.79; 1098.86; 1190.84; 1157.06; 1130.08;
    1118.83; 1117.62; 1113.19; 1111.12 |]

let x1 =  [|33.0; 28.0; 28.0; 28.0; 28.0; 31.0; 31.0; 31.0; 31.0; 31.0; 31.0; 31.0;
    31.0; 31.0; 31.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0;
    41.0; 32.0; 32.0; 32.0; 32.0; 32.0; 32.0; 32.0; 32.0; 15.0; 15.0; 15.0;
    15.0; 15.0; 15.0; 15.0; 15.0; 15.0; 15.0; 40.0; 40.0; 40.0; 40.0; 40.0;
    40.0; 40.0; 40.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0; 41.0;
    0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 22.0; 22.0; 22.0;
    22.0; 22.0; 22.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 20.0; 20.0; 20.0; 20.0;
    20.0; 20.0; 14.0; 14.0; 14.0; 14.0; 14.0; 14.0; 14.0; 14.0; 14.0; 14.0;
    14.0 |]

let x2 = [|0.6; 0.602; 0.552; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6;
    0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.599; 0.6; 0.599;
    0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6;
    0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6; 0.6;
    0.6; 0.6; 0.5; 0.5; 0.5; 0.501; 0.5; 0.5; 0.5; 0.5; 0.501; 0.5; 0.5; 0.501;
    0.501; 0.501; 0.5; 0.5; 0.5; 0.5; 0.5; 0.501; 0.5; 0.5; 0.501; 0.5; 0.5;
    0.501; 0.5; 0.5; 0.501; 0.5; 0.501; 0.5; 0.5; 0.5; 0.5; 0.5; 0.5; 0.5; 0.5;
    0.5 |]

let x3 = [|0.34; 0.36; 0.36; 0.36; 0.36; 0.327; 0.327; 0.327; 0.327; 0.327; 0.327;
    0.327; 0.327; 0.327; 0.327; 0.331; 0.331; 0.331; 0.331; 0.331; 0.331;
    0.331; 0.331; 0.331; 0.331; 0.337; 0.337; 0.337; 0.337; 0.337; 0.337;
    0.337; 0.337; 0.325; 0.325; 0.325; 0.325; 0.325; 0.325; 0.325; 0.325;
    0.325; 0.325; 0.349; 0.349; 0.349; 0.349; 0.349; 0.349; 0.349; 0.349;
    0.338; 0.338; 0.338; 0.338; 0.338; 0.338; 0.338; 0.338; 0.338; 1.032;
    1.032; 1.032; 1.032; 1.032; 1.032; 1.032; 1.032; 1.032; 1.032; 1.032; 1.03;
    1.03; 1.03; 1.03; 1.03; 1.03; 1.038; 1.038; 1.038; 1.038; 1.038; 1.038;
    1.037; 1.037; 1.037; 1.037; 1.037; 1.037; 1.037; 1.037; 1.037; 1.037;
    1.037; 1.037; 1.037; 1.037; 1.037; 1.037; 1.037 |]

let x4 = [|3300.55; 3302.99; 3299.89; 3302.2; 3302.1; 3300.59; 3300.74; 3300.98;
    3298.82; 3300.58; 3301.33; 3303.0; 3302.46; 3301.35; 3301.96; 3299.24;
    3301.64; 3300.22; 3299.85; 3302.54; 3301.53; 3300.82; 3303.19; 3302.23;
    3301.02; 3298.53; 3301.82; 3300.31; 3299.57; 3300.71; 3299.86; 3298.41;
    3301.94; 3299.23; 3299.97; 3303.75; 3302.18; 3301.63; 3301.41; 3299.36;
    3301.6; 3301.71; 3301.8; 3302.25; 3300.79; 3301.87; 3302.15; 3301.26;
    3302.44; 3301.08; 3302.12; 3300.37; 3300.09; 3301.53; 3299.99; 3299.4;
    3302.78; 3302.79; 3301.48; 3302.29; 3301.48; 3301.64; 3299.53; 3300.66;
    3301.95; 3301.12; 3299.88; 3301.08; 3303.02; 3302.37; 3300.44; 3299.26;
    3301.23; 3301.16; 3301.2; 3298.9; 3298.98; 3299.65; 3301.72; 3298.29;
    3300.86; 3301.02; 3299.22; 3299.88; 3300.84; 3300.6; 3299.89; 3299.56;
    3302.58; 3300.02; 3302.48; 3297.8; 3301.06; 3301.35; 3301.84; 3301.69;
    3302.27; 3301.19; 3301.89; 3300.97 |]

let x5 = [|4.4; 4.5; 4.2; 4.4; 4.4; 3.8; 3.8; 3.8; 3.9; 3.8; 4.2; 4.3; 4.3; 4.3; 4.4;
    3.8; 3.8; 3.8; 3.8; 3.8; 4.3; 4.4; 4.4; 4.4; 4.4; 3.8; 3.8; 3.8; 3.7; 4.3;
    4.3; 4.1; 3.9; 3.8; 3.8; 3.8; 3.8; 3.8; 3.9; 3.9; 3.8; 3.8; 3.8; 3.7; 3.7;
    3.8; 3.8; 3.8; 3.8; 3.8; 3.9; 3.8; 3.8; 3.8; 3.8; 3.7; 3.8; 3.8; 3.8; 3.8;
    3.7; 3.6; 3.6; 3.5; 3.5; 3.5; 3.5; 3.5; 3.5; 3.5; 3.5; 3.6; 3.5; 3.6; 3.5;
    3.6; 3.5; 3.5; 3.5; 3.5; 3.5; 3.5; 3.5; 3.6; 3.6; 3.6; 3.6; 3.6; 3.6; 3.6;
    3.5; 3.5; 3.5; 3.8; 3.7; 3.6; 3.6; 3.6; 3.6; 3.6 |]

type public heatflux_int_type = { Name:string; Values:float []; }

let r_ml(y_arr:float[],
         n1:string,     //variable name
         arr1:float[],  //array
         n2:string,
         arr2:float[],
         n3:string,
         arr3:float[],
         n4:string,
         arr4:float[],
         n5:string,
         arr5:float[]
         )  =

        let records = [ { Name = "Y"; Values = y_arr }
                        { Name = n1;  Values = arr1 }
                        { Name = n2;  Values = arr2 }
                        { Name = n3;  Values = arr3 } 
                        { Name = n4;  Values = arr4 } 
                        { Name = n5;  Values = arr5 } 
                        ] 

        let dataset = namedParams [ records.[0].Name.Replace(" ",""), box records.[0].Values;
                                    records.[1].Name.Replace(" ",""), box records.[1].Values;
                                    records.[2].Name.Replace(" ",""), box records.[2].Values;
                                    records.[3].Name.Replace(" ",""), box records.[3].Values;
                                    records.[4].Name.Replace(" ",""), box records.[4].Values;
                                    records.[5].Name.Replace(" ",""), box records.[5].Values;
                                    ] |> R.data_frame 

        let coef_names = R.names(dataset).GetValue<string []>()
        let debug_coef_names = coef_names

        let custom_formula = R.paste( namedParams [ "A", box coef_names.[0];
                                                    "B", box "~"; 
                                                    "C", box coef_names.[1];
                                                    "D", box "+"; 
                                                    "E", box coef_names.[2];
                                                    "F", box "+"; 
                                                    "G", box coef_names.[3];
                                                    "H", box "+"; 
                                                    "I", box coef_names.[4];
                                                    "L", box "+"; 
                                                    "M", box coef_names.[5]];
                                                    ).GetValue<string>()

        let debug_custom_formula = custom_formula

        //let result = R.lm(formula = custom_formula, data = dataset)
        let result = R.glmboost(namedParams ["formula", box custom_formula;
                                             "dataset", box dataset] )

        result

let result = r_ml(y,"X1",x1,"X2",x2,"X3",x3,"X4",x4,"X5",x5)

let result_summary = R.summary(result)
let residuals = result_summary.AsList().["residuals"].AsNumeric().GetValue<float[]>()
let result_fitted_values = R.fitted(result)
let fitted_values = result_fitted_values.AsNumeric().GetValue<float[]>()
let parameters = R.coef(result).AsNumeric().GetValue<float[]>()
tinoswe commented 9 years ago

By the way: I installed mboost 2.3-0 (latest release is 2.4-1) but even with the version I have everything is working fine in pure R...

tpetricek commented 9 years ago

I'd be happy to have a look - but could you please try simplifying the example a bit?

tinoswe commented 9 years ago

Hi Tomas. yes sure. I'll post it here when it's ready. Thanks.

tinoswe commented 9 years ago

Hi again, so... I am building a linear model of y as a function of x1, x2, x3. These are passed as vectors to my custom function (I called it "r_lm" in the code below).

Everything works fine when I call "R.lm" (line that is commented out in my code). However I would like to use "R.glmboost" instead of "R.lm" and I can't find the right way of passing my dataset and my custom function to the algrithm... It gives me an error I don't understand.

In the first post from top you find a link to R "mboost" package documentation (where the glmboost method that I want to use is defined) in case you need to check what arguments can be passed.

Thanks for looking into this.

#r @"packages\R.NET.Community.1.5.15\lib\net40\RDotNet.dll"
#r @"packages\R.NET.Community.1.5.15\lib\net40\RDotNet.NativeLibrary.dll"
#r @"packages\R.NET.Community.FSharp.0.1.8\lib\net40\RDotNet.FSharp.dll"
#r @"packages\RProvider.1.0.13\lib\net40\RProvider.dll"
#r @"packages\RProvider.1.0.13\lib\net40\RProvider.Runtime.dll"
#r @"packages\RProvider.1.0.13\lib\net40\RProvider.DesignTime.dll"

open System
open System.Data
open RDotNet
open RProvider
open RProvider.``base``
open RProvider.stats 
open RProvider.mboost

#I "packages/FSharp.Data.2.0.9/lib/net40"
#r "FSharp.Data.dll"
open FSharp.Data

let y  = [|13.47; 13.31; 13.65; 13.41; 13.9; 11.89; 11.41; 11.86 |]
let x1 = [|33.0;  28.0;  28.0;  28.0;  28.0; 31.0;  31.0;  31.0  |]
let x2 = [|0.61;  0.62;  0.55;  0.6;   0.6;  0.6;   0.6;   0.6   |]
let x3 = [|0.34;  0.36;  0.36;  0.36;  0.36; 0.327; 0.327; 0.327 |]

let r_lm(y  : float [],
         x1 : float [],
         x2 : float [],
         x3 : float [])  =

        let dataset = namedParams [ "Y",  box y;
                                    "X1", box x1;
                                    "X2", box x2;
                                    "X3", box x3;
                                   ] |> R.data_frame 

        let custom_formula = "Y ~ X1 + X2 + X3"
        //let result = R.lm(formula = custom_formula, data = dataset)
        let result = R.glmboost(namedParams ["formula", box custom_formula;"dataset", box dataset] )

        R.fitted(result).AsNumeric().GetValue<float[]>()

let fitted_values = r_lm(y,x1,x2,x3)
tpetricek commented 9 years ago

I'm not quite sure how we should fix this in R provider (it seems to be related to #8 - because the glmboost function is an S3 function and I suspect we are calling it in a wrong way...).

In any case, you can assign the data set to a temporary R variable and call the function directly by passing a string to the R engine:

let dataset = 
    [ "Y",  box y; "X1", box x1; "X2", box x2;
      "X3", box x3 ] |> namedParams |> R.data_frame 

// Assign the 'dataset' to 'df' variable
R.assign("df", dataset)
// Run the command using 'eval' function
R.eval(R.parse(text="require(mboost)"))
let result = R.eval(R.parse(text="glmboost(Y ~ X1 + X2 + X3, data=df)"))
R.fitted(result).AsNumeric().GetValue<float[]>()

I suspect that the R provider thinks that glmboost always takes named parameter x (because one of the S3 overloads does that?) or maybe it somehow explicitly calls a wrong version of the function (?). I guess it should do runtime dispatch based on the type of the first argument - which would be formula. This also requires calling R.formula, but even then it does not work:

let custom_formula = R.formula("Y ~ X1 + X2 + X3")
let result = R.glmboost(namedParams ["formula", box custom_formula;"data", box dataset] )

So, the above is a workaround, but I'll leave this open as it has some additional info for #8. Thanks for reporting the issue!

tinoswe commented 9 years ago

Many thanks to you.