Open bennm37 opened 1 year ago
getK (and so kernelColorise) seem to be the main cultprits.
Ok I think symmetric matrix construction using for loops and then @jit will speed that up. Also just noticed that K2 is generated twice for no reason (hence 9 getK calls not 6) so that should be a 30% speed up for free.
Ahhh just realised K is not square in the big case and so symmetric won't work
Jit now working on branch jit_redo! If you could profile it when you get a moment that'd be fantastic :)
Thanks, managed to shave a few seconds off but still nothing near instantaneous. Might be worth having a quick meeting at some point to discuss where to direct efforts next?
I'll just run an optimisation routine on the apple image in the meantime and see if i can hone down the values we have.
No I think jit is actually worse there as theres only 6 calls but not a 30% reduction in cumtime for getK. But importantly though sorry I didn't mention this but the jit compile time here will probably be significant and included in the cumtime. So if you call getK before profiling, it will first compile getK then use the compiled version when you time it. In reality this is thing you want to time as if even if you colorise 1000 photos in one script it only compiles it once.
Meeting sounds good
So if you call getK before profiling, it will first compile getK then use the compiled version when you time it.
Sorry could you elaborate on this please, I'm not sure I follow
No problem. So the way the @jit decorator works is that when you first call a function with the decorator it compiles that function into a C Executable. This compilation takes time, but after that every time you call the function it happens much faster. So it you were to call coloriser 100 times it would only compile the function into an executable once, but the other 99 times it would run much faster. So as the interesting thing to profile/time is the runtime, not the runtime +compile time best practice when profiling jit is to call Coloriser (or just getK) once before so it compiles, and then start profiling after. Does that make sense?
Ok, if I've done this right here's what Ive got. I ran Coloriser and kernelColorise and the initial profile is as such:
Then running it afterwards I get: If I've understood the difference in time is the compilation time with the @jit executables? The profiler log for the initial run is also much longer and largely populated by numba processes.
Ahh that's dissapointing. Yeah that sounds correct but it looks like it just isn't any faster? Was that your conclusion? I think at the moment the fastest thing would be just to run the main branch version but make sure K2 isn't defined twice. Me and Mitja came up with an optimisation for calculating K2 with the compact support kernel but haven't got it to work yet.
Oh sweet - yeah my plan was to come into the office tomorrow and leave one of their computers run to optimise while i worked on porting the main branch Coloriser to the gui. Ill make a note of fixing K2 as well.
Use CProfile to see what the bottlenecks are in coloriser and suggest what could be sped up. This video is helpful https://www.youtube.com/watch?v=m_a0fN48Alw&t=235s