I have had a quick review of the benchmarking pipeline to better understand how the comparison is performed. During that review I noticed the scaling is always performed while reading the data files using a RobustScaler from sklearn.
The actual model is generated in the evaluate model script, which additionally has a parameter scale_x and scale_y that determine whether the input data X and target y should be scaled.
This means that if scale_x is set to true the input data is scaled twice when using the benchmarking pipeline. I don't know if this behavior is intended, but I suspect that the RobustScaler is an artifact from previous experimentation and should be removed. Otherwise, although I set the scale_x parameter to false, scaling is performed while reading the data.
I have had a quick review of the benchmarking pipeline to better understand how the comparison is performed. During that review I noticed the scaling is always performed while reading the data files using a RobustScaler from sklearn.
https://github.com/EpistasisLab/srbench/blob/1ad633974c9126a8eb6ce936873d2e9b3d40294c/experiment/read_file.py#L32
The actual model is generated in the evaluate model script, which additionally has a parameter
scale_x
andscale_y
that determine whether the input dataX
and targety
should be scaled.https://github.com/EpistasisLab/srbench/blob/1ad633974c9126a8eb6ce936873d2e9b3d40294c/experiment/evaluate_model.py#L46-L61
This means that if
scale_x
is set totrue
the input data is scaled twice when using the benchmarking pipeline. I don't know if this behavior is intended, but I suspect that theRobustScaler
is an artifact from previous experimentation and should be removed. Otherwise, although I set thescale_x
parameter to false, scaling is performed while reading the data.