dpilger26 / NumCpp

C++ implementation of the Python Numpy library
https://dpilger26.github.io/NumCpp
MIT License
3.58k stars 555 forks source link

One Hot Encoding #217

Open brccabral opened 2 months ago

brccabral commented 2 months ago

I am trying to do "one hot encoding". If value is 4, only column 4 should be set to 1, other columns remain 0.

In python I can do this. Each value in y is evaluated in each row.

y = np.array([5, 4, 3, 0, 7, 6, 5, 1, 3, 5])
one_hot = np.zeros((10,10))
one_hot[np.arange(y.size), y] = 1
print(one_hot)

Prints

[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]  # 5
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]  # 4
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]  # 3
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]  # 0
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]  # 7
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]  # 6
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]  # 5
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]  # 1
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]  # 3
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]] # 5

But in NumCpp all values in y are evaluated for all rows.

nc::NdArray<int> y = {5, 4, 3, 0, 7, 6, 5, 1, 3, 5};
auto one_hot = nc::zeros<int>(10,10);
one_hot.put(nc::arange(y.size()), y, 1);
one_hot.print();

Prints

[[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]
[1, 1, 0, 1, 1, 1, 1, 1, 0, 0, ]]
dpilger26 commented 2 months ago

Hmm, yeah they are different behaviors. I'll have to add an additional put overload to accomplish this functionality.