HomerReid / scuff-em

A comprehensive and full-featured computational physics suite for boundary-element analysis of electromagnetic scattering, fluctuation-induced phenomena (Casimir forces and radiative heat transfer), nanophotonics, RF device engineering, electrostatics, and more. Includes a core library with C++ and python APIs as well as many command-line applications.
http://www.homerreid.com/scuff-em
GNU General Public License v2.0
125 stars 50 forks source link

Improve runtime #131

Open lonAlpha opened 7 years ago

lonAlpha commented 7 years ago

feko_sphere.zip Hi, I find that the runtime is much longer than commercial MoM software FEKO. Is there an easy way to improve this? Thanks. I use a sphere with 848 panels as a benchmark. On my laptop, FEKO takes less than 30 seconds to finish (I've deselected Symmetry setting). The runtime of scuff-scatter is 6m5.516s.

Runtime of scuff-scatter: real 6m5.516s user 10m44.400s sys 0m3.836s

Problem size: interior vertices - interior edges + panels = euler characteristic 426 - 1272 + 848 = 2

HomerReid commented 7 years ago

Hi, sorry for the delay in getting back to you over the holidays.

A calculation of this size should certainly not be taking 6 minutes. Likely culprits include (a) the code is not getting correctly built with multithreading support, (b) you are linking against a non-multi-threaded BLAS library. It should be possible to figure this out by looking at the .log files produced by any SCUFF-EM code. If you'd like to post your .log file here or insert a relevant snippet I'd be happy to take a look.

lonAlpha commented 7 years ago

Hi, this is my log scuff-scatter.log.txt You may notice that it takes more than 2 minutes to assemble BEM matrix. My CPU is Intel i3-2350m. I run FEKO on the same laptop.

HomerReid commented 7 years ago

Hi, sorry for the delay again. From the log file, it's clear the code is using multithreading, so that's not the problem. Your CPU is significantly less powerful, and has less cache, than the machines I usually run on, so it's hard to do a direct comparison. 2.5 minutes to assemble the BEM matrix does seem slower than one would like, but not catastrophic. Note that subsequent matrix assemblies within the same run of the code (for example, calculations at different frequencies) will be faster due to caching of frequency-independent components.

What compiler are you using? I think you will get significantly improved performance with the intel compilers.

rfmichael commented 7 years ago

I have a similar concern with scuff-rf. I tested "WireAntenna" (SquareCoil_79) from the example directory. It took 30sec without and 10sec with multi-threading (Intel i7-3770 @ 3.4GHz). I always thought that solving the BEM matrix would be the most time-consuming task, but instead it's completely negligible. Assembling the BEM matrix needs all the time. Is this usual or is here something wrong?

The profiler says: % cumulative self self total
time seconds seconds calls ms/call ms/call name
35.42 366.82 366.82 2207189292 0.00 0.00 scuff::CFDIntegrand3D(...) 19.13 564.88 198.06 rule75genzmalik_evalError 14.05 710.41 145.54 51460075 0.00 0.00 scuff::TaylorDuffySum_FIPPI(...) 5.59 768.34 57.92 __logl_internal

Indeed I would like to help in speeding up scuff-rf, but is there something I can do? Is the above-mentioned "caching of frequency-independent components" also implemented in scuff-rf?

Thanks a lot for Scuff-EM, it's great indeed!

HomerReid commented 7 years ago

It is totally common for the BEM matrix assembly to dominate over the cost of solving the system. That is almost always the case, unless you are using a very slow BLAS/LAPACK installation.

I don't think 10 seconds is a particularly long time for this calculation! Note that you only need to form and factorize the BEM matrix once for a given geometry at a given frequency, and can then reuse that result to solve scattering problems involving any number of incident fields.

rfmichael commented 7 years ago

Thanks for the fast answer! Good to know that I've done everything right. :-)