FZJ-JSC / JuML

3 stars 1 forks source link

DataSet column/row ordering #16

Closed mricherzhagen closed 8 years ago

mricherzhagen commented 9 years ago

I think theDataSet class might create a very bad memory layout for the samples. Armadillo uses Fortran Column major ordering, but we store the samples in rows of features:

These lines in Dataset.h

virtual inline size_t n_features() const { return this->data_.n_cols; }
virtual inline size_t n_samples() const { return this->data_.n_rows; }

indicate that we store the features in columns, and the samples in rows, so

column 1 column 2 column 3 ... column n
feature 1 for sample 1 feature 2 for sample 1 feature 3 for sample 1 ... feature n for sample 1
feature 1 for sample 2 feature 2 for sample 2 feature 3 for sample 2 ... feature n for sample 2

But armadillo uses column-major-ordering, so the data is stored like this:

feature 1 for sample 1, feature 1 for sample 2, ..., feature 1 for sample N,
feature 2 for sample 1, feature 2 for sample 2, ...

So there are huge gaps in memory between the features of a sample.

At this line the Armadillo Matrix is transposed, but according the documentation the .t() returns a transposed copy, which then also would have the wrong row/column ordering?

Or am i missing something here? Can someone explain?