TheDigitalFrontier / parallel-decision-trees

Semester project in CS205 Computing Foundations for Computational Science at Harvard School of Engineering and Applied Sciences, spring 2020.
MIT License
3 stars 1 forks source link

Data structure for tabular classification data #1

Closed johannes-kk closed 4 years ago

johannes-kk commented 4 years ago

Probably easiest to keep it simple, e.g. a dataset is a nested array of a single datatype, all double. Rows (first index) = observations, columns (second index) = columns.

Also likely easiest to include response as a column in the dataset, not as a separate vector. As a convention, response is always rightmost/last column.

johannes-kk commented 4 years ago

I keep having to define variables of type std::vector<std::vector<double>> every time a new dataset is used or created. Maybe "dataset" should be a class?

johannes-kk commented 4 years ago

This is pretty solidified at this point, see e.g. #29