replaced map().sum() with fold

Vilhelm-Ian commented 11 months ago

Most of the changes are cause of the formater sorry

ChristopherBiscardi commented 11 months ago

Most of the changes are cause of the formater sorry

no worries. I'll stick a rustfmt.toml in the root of the project so people have the right formatting stuff.

What's the motivation for moving from .map.sum to fold here?

Vilhelm-Ian commented 11 months ago

Not sure if it's faster. Probably once compiler optimizations are made it will compile to the same assembly. I just think it's better. Since you don't have to create an array and then iterate through it to get the sum. This just does it one go

DanikVitek commented 11 months ago

Since you don't have to create an array and then iterate through it to get the sum. This just does it one go

Iterators are built in a way that this would take one pass in any case. Iterators are lazy, so all filters and maps are done once per the call to .next(). Unlike, for example, in JavaScript, where every .map and .filter call creates a new array.

Vilhelm-Ian commented 11 months ago

@DanikVitek if I did

(0..5).map(|x| { 
println!("hello");
x * 2
}).for_each(|x| println!("world"));

wouldn't the output be

hello
hello
helllo
hello
hello
world
world
world
world
world

EDIT: I re-read your reply not sure if I understand what you mean. Do you want to say that during the same iter call map() and sum() are called. If that is the case wouldn't the above code output be

hello
world
hello
world
hello
world

@DanikVitek yup just checked it out you are right. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=734bda587b57f594335a7fd0b78ffac1

thanks for teaching me something new

Vilhelm-Ian commented 11 months ago

@DanikVitek actually there is a different issue with using .map.max instead of just using fold. fold has a constant memory complexity while the map.max not. Day five I tried to do this with rayon par_iter.map().max() my entire pc froze. But when I did par_iter.fold() it barely used 2gb of memory

ChristopherBiscardi commented 11 months ago

I updated the formatting so that the PR would have less noise

ChristopherBiscardi commented 11 months ago

DHAT profiling for the two approaches shows no difference (perhaps expectedly since we're dealing with integers and not stack allocated types of any kind).

dhat heap profiling

# With Fold ## part2 ``` dhat: Total: 1,096 bytes in 3 blocks dhat: At t-gmax: 1,096 bytes in 3 blocks dhat: At t-end: 1,088 bytes in 2 blocks ``` ## part2_aho_corasick ``` dhat: Total: 57,412,096 bytes in 65,003 blocks dhat: At t-gmax: 19,812 bytes in 28 blocks dhat: At t-end: 1,088 bytes in 2 blocks ``` ## part2_nom ``` dhat: Total: 30,344 bytes in 1,411 blocks dhat: At t-gmax: 1,096 bytes in 3 blocks dhat: At t-end: 1,088 bytes in 2 blocks ``` ## Without Fold ## part2 ``` dhat: Total: 1,096 bytes in 3 blocks dhat: At t-gmax: 1,096 bytes in 3 blocks dhat: At t-end: 1,088 bytes in 2 blocks ``` ## part2_aho_corasick ``` dhat: Total: 57,412,096 bytes in 65,003 blocks dhat: At t-gmax: 19,812 bytes in 28 blocks dhat: At t-end: 1,088 bytes in 2 blocks ``` ## part2_nom ``` dhat: Total: 30,344 bytes in 1,411 blocks dhat: At t-gmax: 1,096 bytes in 3 blocks dhat: At t-end: 1,088 bytes in 2 blocks ```

In addition, Sum is implemented in terms of .fold (std::core::iter::traits::accum.rs)

impl Sum for $a {
    fn sum<I: Iterator<Item=Self>>(iter: I) -> Self {
        iter.fold(
            $zero,
            #[rustc_inherit_overflow_checks]
            |a, b| a + b,
        )
    }
}

Considering that iterators are lazy, it would surprise me to see increased memory usage because of map vs fold.

asm

The asm output is exactly the same.

.sum

used this code

pub fn my_sum() -> u32 {
    std::hint::black_box(0..10).into_iter().map(|n| plus2(n)).sum::<u32>()
}

fn plus2(n: u32) -> u32 {
    n + 2
}

asm for sum

```asm # Compilation provided by Compiler Explorer at https://godbolt.org/ example::my_sum: movabs rax, 42949672960 mov qword ptr [rsp - 8], rax lea rax, [rsp - 8] mov edx, dword ptr [rsp - 8] mov esi, dword ptr [rsp - 4] xor eax, eax mov ecx, esi sub ecx, edx jbe .LBB0_2 mov eax, edx not eax add esi, eax lea eax, [rdx + 3] imul eax, esi add ecx, -2 imul rcx, rsi shr rcx add eax, edx add eax, ecx add eax, 2 .LBB0_2: ret ```

fold

used this code

pub fn my_sum() -> u32 {
    std::hint::black_box(0..10).into_iter().fold(0, |acc, n| acc + plus2(n))
}

fn plus2(n: u32) -> u32 {
    n + 2
}

asm for fold

```asm # Compilation provided by Compiler Explorer at https://godbolt.org/ example::my_sum: movabs rax, 42949672960 mov qword ptr [rsp - 8], rax lea rax, [rsp - 8] mov edx, dword ptr [rsp - 8] mov esi, dword ptr [rsp - 4] xor eax, eax mov ecx, esi sub ecx, edx jbe .LBB0_2 mov eax, edx not eax add esi, eax lea eax, [rdx + 3] imul eax, esi add ecx, -2 imul rcx, rsi shr rcx add eax, edx add eax, ecx add eax, 2 .LBB0_2: ret ```

As a result I believe that if you saw changes in your application on day 5 it was due to something else and not .map().sum().

Here is the PR that made this true for Rust, if you're interested.

Vilhelm-Ian commented 11 months ago

thanks @ChristopherBiscardi can I clean up my day 5 code ask you to take a look when you have time(since you are also busy with bevy game jame and advent of code currently)

ChristopherBiscardi commented 10 months ago

@Vilhelm-Ian sure, happy to take a look when I have time

Vilhelm-Ian commented 10 months ago

@ChristopherBiscardi I just realized something .iter().map().sum() might not allocated more memory than .iter().fold() but it seems that's not the case for .par_iter().map().sum() and .par_iter().fold().reduce(). The first one makes my pc crash

ChristopherBiscardi commented 10 months ago

@Vilhelm-Ian While using rayon does mean that you will be iterating on more items in parallel, and thus could tip the limit of your PC if the individual .map calls are big there are two important notes:

First, rayon's ParallelIterator::map is it's own and is not the same as Iterator::map, as shown here: https://rust-lang-nursery.github.io/rust-cookbook/concurrency/parallel.html#map-reduce-in-parallel

Second, rayon's docs say .map.reduce and .sum are effectively the same so I would be surprised if one caused your computer to crash and the other didn't.

Basically equivalent to self.reduce(|| 0, |a, b| a + b), except that the type of 0 and the + operation may vary depending on the type of value being produced.

The rayon docs also have information relating to fold vs map/reduce which should be reviewed.

ChristopherBiscardi / advent-of-code

replaced map().sum() with fold #7

asm

.sum

fold