EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning
GNU General Public License v3.0
788 stars 204 forks source link

Do not limit the elements in VPTree to vectors #73

Closed albertoandreottiATgmail closed 6 years ago

albertoandreottiATgmail commented 6 years ago

One of the reasons to work in metric spaces is to abstract away from what the elements you're measuring distances are. They could be images, text, audio samples, excel spreadsheets.. whatever as long as they come with a distance that defines a metric space. Why are you limiting this to numeric vectors only? All that you would need is an interface,

MetricDistance { public double distance(SomeType a, SomeType b); }

and let the user provide an implementation of that.

EdwardRaff commented 6 years ago

1) the design choice is simply a layover from 8 years ago. I'm currently working on refactoring much of the distance based code (as free time permits). 2) Most distance metrics are defined on numeric features. 3) Because JSAT is focused specifically on structured data.

The latter reason is why I will not be implementing any kind of interface as you've requested. A common trick used in most frameworks is that if you need to work with unstructured data, you store it in a separate array and use 1-dimensional vectors. Your custom distance function then grabs the correct unstructured objects based on the index, and computes its distance as desired. You can see this style in use in my LZJD project.