RfastOfficial / Rfast

A collection of Rfast functions for data analysis. Note 1: The vast majority of the functions accept matrices only, not data.frames. Note 2: Do not have matrices or vectors with have missing data (i.e NAs). We do no check about them and C++ internally transforms them into zeros (0), so you may get wrong results. Note 3: In general, make sure you give the correct input, in order to get the correct output. We do no checks and this is one of the many reasons we are fast.
139 stars 19 forks source link

update standardise #21

Closed RubD closed 4 years ago

RubD commented 4 years ago

fixed the situation where both scale and center are FALSE

statlink commented 4 years ago

Dear RubD, this function is meant to do something. You call this function and ask it not to do what it meant to do and then you say it does not work.

RubD commented 4 years ago

Ok, I see your point, but I should have explained the problem better. What if this function is part of a larger function? Where the user perhaps does not wish to scale and center their matrix, but wants to execute the rest of the analysis?

Right now, this function gives unusual errors and can even have dangerous behaviour as I will explain below:

problem 1: x = matrix(rnorm(100), 10, 10) test1 = standardise(x, F, F) This will result in an error that most people will not expect: Error in standardise(x, F, F) : object 'y' not found

problem 2: y = matrix(rnorm(100, mean = 10), 10, 10) # unrelated matrix test2 = standardise(x, F, F) identical(y, test2)

This is potentially a bigger problem, because now you don't get an error but your function has silently replaced the outcome with a previously declared y object.

So by just making a small modification (at not cost) there wouldn't be any problems to begin with, however my intention was just to help and I'm not going to argue any further, so you can just close this again.

Thanks for your time, Ruben

statlink commented 4 years ago

In that case, you do not want to standardise the data. So, if the argeuments are "F" and "F" just put an "if" condition right before calling our function. What we offer is bare bone functions. We cannot add inside every case or every request each person has. Do not take this offensively, and if you think I am offensive I apologize. Truth be told though, that we focus on speed, hence we expect the user to have some knowledge of what he wants. We squeeze the orange a give its orange juice, we do not peel off the orange. Again, no offence.

Michail

RubD commented 4 years ago

Michael's standardise does exactly what R's scale. You can check it on your own even for the case you suggest us.

I just want to quickly comment, because this statement is not correct or at least has the caveats that I mentioned before. You can put an if statement before standardise like statlink suggested, but you can not simply replace base::scale with Rfast::standardise, for exactly the problems I have demonstrated above and below (R-3.6.3 and Rfast 1.9.9):

start a new r-session & clear all previous objects

library(Rfast) x = matrix(rnorm(100), 10, 10)

problem 1: object 'y' not found

Rfast method

test1 = standardise(x, F, F) # object 'y' not found error identical(x, test1) # does not work, because test1 is not found

base method

test1sc = scale(x, F, F) identical(x, test1sc) # TRUE, scale returns original matrix x as expected

problem 2: y from parent environment is used

Rfast method

y = matrix(rnorm(100, mean = 10), 10, 10) # unrelated matrix test2 = standardise(x, F, F) identical(y, test2) # TRUE, and this is potentially dangerous if people are not aware of this issue

base method

test2sc = scale(x, F, F) identical(y, test2sc) # FALSE identical(x, test2sc) # TRUE

Bottomline, this is some quirky behavior and this function can not simply replace base::scale without providing additional if-statements. In this case a simple additional else statement at the end of your code would fix these issues without any added cost in terms of time or efficiency. However, as I said before, this was just a mere suggestion and if you don't agree then I'm also fine with that.

Keep up the great work, Ruben

ManosPapadakis95 commented 4 years ago

Ruben we cannot check for all the possibilities. For example I never check if an argument is NULL. If we remove these checks, even these small checks, the functions can be faster because we are in an interpreted environment.