akielaries / openGPMP

Hardware Accelerated General Purpose Mathematics Package
https://akielaries.github.io/openGPMP/
MIT License
8 stars 3 forks source link

DataFrame/DataTable related functionality #29

Closed akielaries closed 1 year ago

akielaries commented 1 year ago

A data structure similar to that of numpy arrays and pandas DataFrame is the ultimate goal of the DataTable structure. This allows data to be parsed and visualized in tabular format and is often how data comes in. It is common to pair the pandas.read_csv or pandas.read_json functions to Machine Learning related code, and the DataFrame object allows for easy specification on sub-data in our sets. For example specifying columns and rows and perhaps more. The Matrix/Vector portion of the Linear Algebra module offers similar implementation of tabular data but the current development is focused in /modules/structs/ directory with several attempts at this type of data structure.

akielaries commented 1 year ago

assigning to @eeddgg and myself

akielaries commented 1 year ago

update 04/12/2023

DataTable class is in development (header / src) with 3 specific types

// alias for the pair type of strings
typedef std::pair<std::vector<std::string>,
                  std::vector<std::vector<std::string>>>
    DataTableStr;
// alias for pair type of 64 bit integers
typedef std::pair<std::vector<int64_t>, std::vector<std::vector<int64_t>>>
    DataTableInt;

// alias for pair type of long doubles
typedef std::pair<std::vector<long double>,
                  std::vector<std::vector<long double>>>
    DataTableDouble;

with 4 semi-working functions

    // similar to pandas.read_csv, parses CSV files
    DataTableStr csv_read(std::string filename, std::vector<std::string> columns = {});
    // converts DataTableStr -> DataTableInt 
    DataTableInt str_to_int(DataTableStr src);
    // converts DataTableStr -> DataTableDouble
    DataTableDouble str_to_double(DataTableStr src);
    // function to display the DataTable neatly
    template <typename T>
    void display(std::pair<std::vector<T>, std::vector<std::vector<T>>> data, bool display_all = false);

By default CSVs are read in and elements stored as std::string

Early stage implementation of this will suffice for PR + merge