TeamAtomECS / AtomECS

Cold atom simulation code
GNU General Public License v3.0
46 stars 12 forks source link

Const generics num of laser #71

Closed minghuaw closed 2 years ago

minghuaw commented 2 years ago

The fixed length arrays that had length of BEAM_LIMIT now have the length specified with constant generics, which is eventually exposed to the user when creating the builder.

minghuaw commented 2 years ago

What is the purpose of LASER_CACHE_SIZE? It is defined four times in four different mods with the same value as BEAM_LIMIT. The LASER_CACHE_SIZE is still limiting the number of lasers.

ElliotB256 commented 2 years ago

Thanks for the PR @minghuaw !

What is the purpose of LASER_CACHE_SIZE? It is defined four times in four different mods with the same value as BEAM_LIMIT. The LASER_CACHE_SIZE is still limiting the number of lasers.

When we loop over the lasers, we take them into smaller batches of fixed size arrays of LASER_CACHE_SIZE and loop through these. But it might be worth reprofiling given changes in other areas of the code to see if it's worth the extra complexity.

You might find this interesting: https://github.com/TeamAtomECS/AtomECS/issues/9#issuecomment-776286841

minghuaw commented 2 years ago

What is the profiling program you used? I will probably play with it and see what to do with LASER_CACHE_SIZE

ElliotB256 commented 2 years ago

My workstation has an intel processor so I was using intel vtune, and found the microarchitecture profiling was the most useful.

I was also thinking about adding some microbenching to atomecs, I started doing this at one point but stopped due to time required elsewhere. In my experience microbenching can guide the general performance but one has to be careful not to create highly artificial benchmarks that give misleading results. What are your thoughts on microbenching?

minghuaw commented 2 years ago

I am not planning to do microbenching yet. Right now, it's mainly just for two things.

  1. The effect of LASER_CACHE_SIZE
  2. I have seen some for loops that could probably be merged into one and may provide some performance improvement