algorithmfoundry / Foundry

The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning
Other
131 stars 41 forks source link

VectorFactory#copyValues(Collection<? extends Number>): The iteration order of the collection is used blindly #44

Closed Zero3 closed 9 years ago

Zero3 commented 9 years ago

It appears that VectorFactory#copyValues(Collection<? extends Number>) expects the provided collection to be ordered, since it uses the iteration order of the collection when creating the vector. As some Java collections have arbitrary iteration order, there is potential for nasty ordering bugs if the developer is not careful to consider the underlying implementation of the collection before using this method.

I suggest that the expectation about iteration order is documented in the javadoc for the method. Alternatively, the method could be replaced by variants only accepting ordered collections (such as List and SortedSet, but this might limit the compatibility with unknown/future collections).

jbasilico commented 9 years ago

I updated the javadoc. Yes, it is unwise to create these from things like a HashSet that have a bad iteration order. But then again you can always replace them with a LinkedHashSet and have it based on insertion order, which is reasonable. Given that this is a convenience method, I don't think restricting it further makes sense.

Maybe adding a method to create it from a map of Int -> Number would be clearer?

Zero3 commented 9 years ago

Thanks. This sentence is broken though:

It does this by iterating through collection and initializing the by iterating through them in the order specified by the collection.

I think documentation is a good solution here, yeah.

I probably wouldn't have any use of the map method you mention. If I had int-indexed values, I would probably keep them in a list and use the other method. I do, however, have a SortedMap<String, Double> of features that I am converting to a Vector, so a method with an interface like this would work for me:

VectorFactory#copyValues(Map<?, ? extends Number>)

This is probably overkill though, as I can just as easily use .values() on my map and pass the returned collection to the method we were originally discussing. So I think things are fine, really :).