EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning
GNU General Public License v3.0
788 stars 204 forks source link

DataSet / DataPoint interface #62

Closed erosval closed 7 years ago

erosval commented 7 years ago

In order to facilitate the use of the library, is possible to have some interface like other library? You can look at "Apache Commons Math" where point are interface of Clusterable.

Here some reference: Clustering algorithms and distance measures Clusterable DBSCANClusterer

EdwardRaff commented 7 years ago

JSAT already has many such interfaces. For example, here is the one for clustering algorithms http://www.edwardraff.com/jsat_docs/JSAT-0.0.8-javadoc/jsat/clustering/Clusterer.html

Sent from my iPhone

On Jun 21, 2017, at 11:06 AM, Eros Valzasina notifications@github.com wrote:

In order to facilitate the use of the library, is possible to have some interface like other library? You can look at "Apache Commons Math" where point are interface of Clusterable.

Here some reference: Clustering algorithms and distance measures Clusterable DBSCANClusterer

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

erosval commented 7 years ago

Thanks for the quick reply. I'll try following these instructions. The problem for me is that DateSet is not an interface. What we need is an "Interface for n-dimensional points that can be clustered together".

EdwardRaff commented 7 years ago

I still don't understand your request. Why would you want that? A dataset is more generic than needing to be clustered (and is an abstract class) . Clustering is a task that takes a dataset, and given the interface there are many clustering algorithms to use. I'm not seeing how this isn't covered by the current code.

Sent from my iPhone

On Jun 22, 2017, at 5:37 AM, Eros Valzasina notifications@github.com wrote:

Thanks for the quick reply. I'll try following these instructions. The problem for me is that DateSet is not an interface. What we need is an "Interface for n-dimensional points that can be clustered together".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

erosval commented 7 years ago

The problem is that a DataSet is not an interface.

If I need to "cluster" a set of point in a java array what I have to do? Using Apache Math the only thing to do is to implement an interface: public class Point implements Clusterable {

Sorry but here is a big difference from interfaceto abstract class.

EdwardRaff commented 7 years ago

You need to convert your data to an instance of a DataSet , which can then be clustered. Please look at the clustering examples in the wiki

Sent from my iPhone

On Jun 22, 2017, at 7:39 AM, Eros Valzasina notifications@github.com wrote:

The problem is that a DataSet is not an interface.

If I need to "cluster" a set of point in a java array what I have to do? Using Apache Math the only thing to do is to implement an interface: public class Point implements Clusterable {

Sorry but here is a big difference from interfaceto abstract class.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

erosval commented 7 years ago

Could you suggest an example in which a vector of geographic points is converted to a dataset? I only have a list of points (x,y) so I cannot use ARFFLoaderor something else as in the simplest example I found.

I also need to calculate distance between points only if necessary and not pre-calculate every distance.

Apache Commons Math respond to all these requirement but is not complete as your library

EdwardRaff commented 7 years ago

You can create a List of DataPoint objects which can be used to create SimpleDataSet object from that list. You can create a DenseVector object from a simple array.

I also need to calculate distance between points only if necessary and not pre-calculate every distance.

All of the JSAT clustering algorithms based on distance metrics will do this by default, unless its not possible to do it that way.

I'm on vacation this week, so I really can't help any more than this at this time.