giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
847 stars 173 forks source link

Support for pandas dataframe input #125

Open ulupo opened 4 years ago

ulupo commented 4 years ago

Description

It is worth discussion whether we would like and want to always ensure that our MapperPipelines will be able to take pandas dataframes as inputs directly. There would be potentially many benefits to this ranging from less preprocessing by the user to added functionality for displaying summary information (colour, histograms etc), when the quantity of interest is a specific column which can be more easily accessed by name than by index location.

It is worth pointing out that a first iteration of this should not add pandas to the requirements files for giotto-learn, in a similar way as scikit-learn.

ulupo commented 4 years ago

Fixed by (#135).

ulupo commented 4 years ago

At the level of documentation, this is still an open issue as all docstrings need to be slightly tweaked to state array-like instead of ndarray when the input can be a pandas dataframe. Notice that outputs are still always ndarrays, however.

Notice that this applies to the whole library.

ulupo commented 4 years ago

Furthermore, we might wish to extend the functionality of Projection to allow for passing column names instead of positional indices.

ulupo commented 4 years ago

The original issue was fixed by (#137) for the Mapper module. But the rest of the library still needs to be looked at systematically.