TheoBourdais / ComputationalHypergraphDiscovery

This is the source code for the paper "Computational Hypergraph Discovery: A Gaussian process framework for connecting the dots".
Apache License 2.0
3 stars 0 forks source link

Document requirements on the type of values in the dataframe. #1

Open NicolasRouquette opened 6 months ago

NicolasRouquette commented 6 months ago

The normalization logic assumes the data is numeric:

https://github.com/TheoBourdais/ComputationalHypergraphDiscovery/blob/5cfe4349119ea8ee58dcce336f75959e3996b294/src/ComputationalHypergraphDiscovery/_GraphDiscoveryMain.py#L73

I suggest documenting this explicitly in the README. For example, should we delete or transform such columns into floating point numbers if we have CSV data with boolean or enum variables?

TheoBourdais commented 6 months ago

The subsequent kernels also assume data is numeric, so I suggest making it explicit in the documentation.

I added in the readme:

Disclaimer: Note that the data is assumed to be real numbers. The algorithm only accepts data in the form of a 2D array of shape (n_features,n_samples). Other shapes will be rejected, and other types of data will be treated as real numbers.

This disclaimer is supported by the following logic in the definition of GraphDiscovery objects.

https://github.com/TheoBourdais/ComputationalHypergraphDiscovery/blob/e0b48e8296af9405bf0292880c3b3f0d74450c48/src/ComputationalHypergraphDiscovery/_GraphDiscoveryMain.py#L68-L71

and added more precise description in the docstrings of the GraphDiscovery object