eclipse / eclipse-collections

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.
http://www.eclipse.org/collections
2.42k stars 604 forks source link

Optimize immutable primitive ArrayList creation from primitive iterables. #1649

Open donraab opened 1 month ago

donraab commented 1 month ago

This PR attempts to fix a performance problem I found in immutablePrimitiveListFactoryImpl.stg for the withAll(<primitive>Iterable) method.

This method would call size at least three times on the primitive Iterables with a size > 1. For Lazy primitive Iterables, this will have the affect of iterating and applying procedures, functions, and predicates three times. The solution I have come up with is to add a toSizedArray(int size) method to primitive iterables that will target the contents of the iterables to the presized array. One of the things to open to discussion/debate is how to handle the situations where the size is < or > the size of the iterable. In the < case right now, I fill up to the size of the new array, and the rest of the iterable contents will be left off. In the > case, I fill the array with all the contents of the iterable, and the new array is left with default initialized padded values in the remainder of the array.

motlin commented 1 month ago

Wouldn't this be an asymmetry between the object/primitive APIs?

I think the most consistent thing to do would be to call toArray(). On Iterables of unknown origin, that's one of the only APIs we have available that we can assume is atomic. It may not be quite as efficient but it would reduce the 3 walks through and give us the performance of ArrayList growth. And it wouldn't require adding public API.

donraab commented 1 month ago

Yes, there would be asymmetry. Let me take a fresh look and see if I can come up with a solution that only uses toArray.

Wouldn't this be an asymmetry between the object/primitive APIs?

I think the most consistent thing to do would be to call toArray(). On Iterables of unknown origin, that's one of the only APIs we have available that we can assume is atomic. It may not be quite as efficient but it would reduce the 3 walks through and give us the performance of ArrayList growth. And it wouldn't require adding public API.