EconForge / interpolation.py

BSD 2-Clause "Simplified" License
124 stars 35 forks source link

Change data order for multilinear routines. #1

Closed albop closed 8 years ago

albop commented 10 years ago

Currently each observation is assumed to take one columns. We want to transition to one observation per row instead.

sglyon commented 10 years ago

This is an important distinction, especially when thinking about writing routines for both numpy and Julia as they have different array orderings.

albop commented 10 years ago

Yes, but this issue is actually more subtle than that. When I first wrote the multilinear routines, I reversed all conventions but I don't think anymore that it is a good idea.

We probably want to keep the same user interface than Julia, with numpy, i.e. one line per observation, even though memory ordering is not the same. C-order for memory layout combined with one observation per row for the user-interface ensures that each observation is contiguous in memory, which is a desirable feature for big data sets. On a small workstation, both layouts seem to give similar performance anyway.

Another argumument to keep one observation per row, is that pandas use this convention, and so does Numba, even though I never fully understood why (https://groups.google.com/a/continuum.io/forum/#!msg/numba-users/sGnTwU56f7c/gUhhko6ZDjAJ).

The unintuitive result of these considerations is that we should keep the same ordering as in Julia with different memory layout, knowing that is is probably suboptimal in one of the two implementations (possibly Julia).