fogleman / primitive

Reproducing images with geometric primitives.
https://primitive.lol/
MIT License
12.35k stars 608 forks source link

Parallelize *Model.computeColor #2

Closed hectorj closed 7 years ago

hectorj commented 7 years ago

This PR parallelizes *Model.computeColor. One goroutine per line.

On my computer, in some very crude benchmark (owl.png, n=3) it shows a ~25% gain. It does not alter the results (checked using https://github.com/hectorj/primitive/blob/basic-test/main_test.go).

(PS: I love your work, results look amazing)

fogleman commented 7 years ago

I actually did something similar before but didn't commit it. :)

Any performance difference if you send results over a channel and aggregate them instead of using a mutex on shared variables?

hectorj commented 7 years ago

From a quick check: nope, it's roughly the same results.

On Tue, Sep 20, 2016, 23:51 Michael Fogleman notifications@github.com wrote:

I actually did something similar before but didn't commit it. :)

Any performance difference if you send results over a channel and aggregate them instead of using a mutex on shared variables?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fogleman/primitive/pull/2#issuecomment-248361886, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUMN0yAMdqb9uicSvp3Y1JTgzUSI1v9ks5qsA8pgaJpZM4KBxm2 .

fogleman commented 7 years ago

Cool, thanks. I'll look into this PR further later tonight!

hectorj commented 7 years ago

Rebased to fix conflict.

fogleman commented 7 years ago

Hmm, I just ran your code and it was actually a little slower... (51 vs 50 seconds)

go run main.go -i examples/pyramids.png -r 256 -o a.png -n 100 -v
fogleman commented 7 years ago

BTW, when I tried to do this, I created N workers and had them each process every Nth scanline, starting at 0, 1, 2, 3 if N=4. N was set to runtime.GOMAXPROCS(0).

hectorj commented 7 years ago

Ah, indeed.

I haven't checked the details of the latest commits you did, but the master branch is twice as fast as it used to be on my computer (even more when downsizing the input), and this PR no longer brings any performance gain (sync and scheduling overhead has become roughly equivalent to what it could bring, I guess).

Profiling still shows computeColor taking ~50% of the CPU time, but it will have to be optimized some other way.

I'll check later if limiting workers reduces the overhead, but I don't have much hope for anything significant.

Here is the CPU graph from pprof, made on master branch, just for model.Run(3): pprof_primitive_master2

Closing for now.

Thanks for checking it.

fogleman commented 7 years ago

@hectorj Here's my attempt...

https://github.com/fogleman/primitive/compare/parallel

It seems that it does speed things up if the input image is relatively large, but it slows things down for 128x128px inputs, for example.