dfdx / Spark.jl

Julia binding for Apache Spark
Other
205 stars 39 forks source link

Make map and flat_map faster. #42

Closed aviks closed 7 years ago

aviks commented 7 years ago

Use a generator instead of Iterators.imap, its much faster. As a side effect, we do not need a dependency on the Iterators package.

Tests pass locally.

aviks commented 7 years ago

The second commit updates the implementation of FlatMapIterator to be faster. When we go 0.6 only, and using Channels, this can be replaced simply with Base.flatten, and that will be faster still. However, on 0.5, that does not work correctly, since tasks and channels are not idempotent wrt done in 0.5

aviks commented 7 years ago

Apologies, the fast_map fix was problematic, I hadn't removed all references to Iterators. I have fixed that, and rebased and squash.

This should now be ready to go.

dfdx commented 7 years ago

Looks good to me! Thanks!