Closed hectorj closed 7 years ago
I actually did something similar before but didn't commit it. :)
Any performance difference if you send results over a channel and aggregate them instead of using a mutex on shared variables?
From a quick check: nope, it's roughly the same results.
On Tue, Sep 20, 2016, 23:51 Michael Fogleman notifications@github.com wrote:
I actually did something similar before but didn't commit it. :)
Any performance difference if you send results over a channel and aggregate them instead of using a mutex on shared variables?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fogleman/primitive/pull/2#issuecomment-248361886, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUMN0yAMdqb9uicSvp3Y1JTgzUSI1v9ks5qsA8pgaJpZM4KBxm2 .
Cool, thanks. I'll look into this PR further later tonight!
Rebased to fix conflict.
Hmm, I just ran your code and it was actually a little slower... (51 vs 50 seconds)
go run main.go -i examples/pyramids.png -r 256 -o a.png -n 100 -v
BTW, when I tried to do this, I created N workers and had them each process every Nth scanline, starting at 0, 1, 2, 3 if N=4. N was set to runtime.GOMAXPROCS(0).
Ah, indeed.
I haven't checked the details of the latest commits you did, but the master branch is twice as fast as it used to be on my computer (even more when downsizing the input), and this PR no longer brings any performance gain (sync and scheduling overhead has become roughly equivalent to what it could bring, I guess).
Profiling still shows computeColor
taking ~50% of the CPU time, but it will have to be optimized some other way.
I'll check later if limiting workers reduces the overhead, but I don't have much hope for anything significant.
Here is the CPU graph from pprof, made on master branch, just for model.Run(3):
Closing for now.
Thanks for checking it.
@hectorj Here's my attempt...
https://github.com/fogleman/primitive/compare/parallel
It seems that it does speed things up if the input image is relatively large, but it slows things down for 128x128px inputs, for example.
This PR parallelizes *Model.computeColor. One goroutine per line.
On my computer, in some very crude benchmark (owl.png, n=3) it shows a ~25% gain. It does not alter the results (checked using https://github.com/hectorj/primitive/blob/basic-test/main_test.go).
(PS: I love your work, results look amazing)