cornelltech / snack

Stochastic Neighbor and Crowd Kernel (SNaCK) embeddings: Quick and dirty visualization of large-scale datasets via concept embeddings
Other
51 stars 12 forks source link

Syntax issue #1

Open tharun2011 opened 8 years ago

tharun2011 commented 8 years ago

What does this statement mean ' cdef double[:, ::1] Y = Y_np ' ? Why can't we directly use Y_np? In line 338, Y_np -= np.mean(Y_np, 0). Shouldn't this be Y instead of Y_np?

gcr commented 8 years ago

Ah, Y_np is a numpy array and Y is a special Typed Memoryview object. That statement defines Y as a new Typed Memoryview that refers to the same data that Y_np refers to. Cython uses Memoryviews to refer to a structured block of memory. They can be backed by Numpy arrays, or anything else that implements the buffer interface (for example, i think you can convert ordinary strings to byte memoryviews)

We do it this way because indexing elements of a Typed Memoryview is much faster than indexing numpy arrays. Memoryviews can be indexed with pointer arithmetic, but Numpy indexing requires the slow Python interpreter to dispatch to Numpy's Array.__get__() Python call, then to the Numpy C implementation, then back to the Python interpreter, then back to the Cython program.

However, Memoryviews don't have defined arithmetic operations. If you want to subtract something from all the elements, you have to write the explicit loop yourself. So, we sometimes use Y_np -= np.mean(...).