ParticleTracking often results in unnecessarily large files

Horymir001 commented 1 year ago

Context

I run quasi-3D simulations on LWFA acceleration on cm scales with PW pulses and track only sufficiently relativistic electrons (E>350 MeV). Despite that selected threshold and relatively low output frequency (once per 5 fs), the size of TrackParticles_*.h5 file can be as large as 200 GB. Many macroparticles (even hundreds of thousands) pass the filter which is far from necessary.

Mitigation

The following tricks came to our mind to reduce the output size:

Setting another artificial electron species in such a way that the total electron density is kept. Something like ne = n1 + n2, n1=0.9ne, n2=0.1ne, ppc1=9, ppc2=1. It can also be adapted for ionisation injection etc.
Trick with a filter. I experiment with this, inspired by #74
```
def my_filter(particles):
indices = np.arange(len(particles.px))
condition = (particles.px>684.9) * (particles.px<684.9*1.02)*(particles.id==0)*(np.mod(indices,100)==5) + (particles.id>0)
return condition
```
It considers only the electrons which newly achieved px above a certain threshold and should throw 99% of them away. Those which have been selected already are kept.

Disadvantages

in 1, it is ponderous and still the reduction is not that large.
in 2, more particles are actually selected as the filter is done every time step. Also, a particular electron can be selected also only after it decelerates to the given energy range, disturbing the statistics etc.

The solution

In my opinion, the ID should be indeed assigned always after the condition given in the filter is passed, and then a dice should be thrown, and, with a given probability, that particle is either tracked or excluded from tracking forever.

mccoys commented 1 year ago

I believe that is already possible using an array of random numbers within the filter.

Horymir001 commented 10 months ago

Hi Arnaud and all, I tried to implement the solution suggested at the SMILEI workshop

def my_filter_random_select(particles) :
   ntot = particles.id.size              # total number of particles
   prob = 0.10                                 # 10% probability for chosen particles
   mask1 = (particles.id==0)
   mask2 = mask1*(np.random.rand(ntot)<prob)
   particles.id += (72057594037927936*mask1).astype('uint64')
   return (mask2)+(particles.id>72057594037927936)

DiagTrackParticles(
   species = "electron", # species to be tracked
   every = 2000,
   filter = my_filter_random_select,
   attributes = ["x","y", "z", "px", "py", "pz","weight"]
)

for my problem of LWFA with a moving window running for a long time. I would like to track only the electrons with px>30. So I tried to modify the routine as

return (mask2)+(particles.id>72057594037927936)*(particles.px>30)

It indeed selects only 10% of the particles, but the results are not as good as expected. Actually, even larger files are generated. I attach two input files, test_track_all.txt and test_track_fraction.txt where all the particles with px>30 and their fraction should have been observed. Indeed, as the energy spectrum shows, the statistics in the former case is lower. The blue line values are multiplied by a factor of 10. Beware the line around 0 though. enspec_tr

Here are some numbers:

Number of particles to be sorted: -- frac: 383 k -- all: 31k
Number of sorted non-nan particles: -- frac: 6.5k -- all: 31k
Disordered file size: -- frac: 30.6 MB -- all: 52.3 MB
Ordered file size: -- frac: 2.83 GB -- all: 234 MB So the file size even increased by an order of magnitude when using my improved trick.

Could you please have a look at this again? Solving this issue would help me a lot. I intend to do large LWFA simulations at cm distances. The track particle files swell up to TB. For that, the particles must be tracked with a relatively short time-step to calculate betatron radiation. At the same time, I do not need to capture all the electrons in the bunch.

beck-llr commented 10 months ago

For what you want to do I would try something like

def my_filter_random_select(particles) :
   ntot = particles.id.size              # total number of particles
   prob = 0.10                                 # 10% probability for chosen particles
   mask1 = (particles.id==0)*(particles.px > 30)
   mask2 = mask1*(np.random.rand(ntot)<prob)
   particles.id += (72057594037927936*mask1).astype('uint64')
   return (mask2)+(particles.id>72057594037927936)

DiagTrackParticles(
   species = "electron", # species to be tracked
   every = 2000,
   filter = my_filter_random_select,
   attributes = ["x","y", "z", "px", "py", "pz","weight"]
)

This should select 10% of the particles getting over the threshold of px=30 at any point in time. The id of particles below this threshold remains at 0 and they get a chance to be selected later if they ever get above it. Let us know how that works.

beck-llr commented 10 months ago

Another thing worth mentionning is that you are not forced to sort all the tracked particles. You can use a filter at postprocessing as well. Check out the sorted_as option for track particles in the doc. That might be handy too.

Horymir001 commented 10 months ago

Hi, thank you for this great response! I tested the first option, it works as intended. Here are the details: Number of particles to be sorted: -- frac: 3.1k -- all: 31k Number of sorted non-nan particles: -- frac: 3.0k -- all: 31k Disordered file size: -- frac: 6.72 MB -- all: 52.3 MB Ordered file size: -- frac: 23.0 MB -- all: 234 MB

Here is the updated graph:

Thank you very much! You might also stress this advice also to your documentation.

SmileiPIC / Smilei