centering_io python package

ExaScience / smurff

Bayesian Factorization with Side Information in C++ with Python wrapper

MIT License

70 stars 14 forks source link

centering_io python package #102

Closed tvandera closed 6 years ago

tvandera commented 6 years ago

from code from Felipe
from python code in python/test/center.py

tvandera commented 6 years ago

This might also serve as inspiration: https://github.com/cran/GFA/blob/master/R/normalizeData.R

ipasechnikov commented 6 years ago

Created centering_io python package using center.py script. Available in commit 2ce280efc7e146cb283a68b43926c226f7274d2b. Also added a few tests. Currently they all pass.

Should we remove writing mean values in mean function? https://github.com/ExaScience/smurff/blob/2ce280efc7e146cb283a68b43926c226f7274d2b/python/centering_io/centering_io/__init__.py#L50-L52

tvandera commented 6 years ago

Add

std(m, mode): compute standard deviation globally (mode==0), per row (mode==1), per column (mode==2)
scale(m, mode, s): scale globally, rows, or columns by s
center_and_scale(m, mode): centers and scales m, returns a list of centered and scaled m, mean, standard deviation.

( centered_and_scaled_m, mean_m, std_m ) = center_and_scale(m, 1)

mean_m = mean(m, 1)
std_m = std(m, 1)
centered_and_scale_m = center_and_scale(m, 1, mean_m, std_m)

ipasechnikov commented 6 years ago

Implemented center_and_scale function with pretty much the same signature as sklearn.preprocessing.scale except for copy parameter. We don't have it.

Our version of function returns a tuple (centered_and_scaled_m, mean_m, std_m).

If you specify mode as 'none' then mean_m and std_m would be None.
If you set with_mean to False then mean_m would be None.
If you set with_std to False then std_m would be None.

Don't know whether it's a good idea to have such an interface. Seems fine for me. Feel free to propose any ideas.