Warwick-Plasma / epoch

Particle-in-cell code for plasma physics simulations
https://epochpic.github.io
GNU General Public License v3.0
176 stars 56 forks source link

There is a bug in particle ID #361

Open jcyu96 opened 2 years ago

jcyu96 commented 2 years ago

Dear developers, I used subset to select some particles at the beginning, and when I checked the ID of particles at different times, I found that some particles with some certain ID suddenly disappeared at some time and then suddenly appeared again later. I used simple_outflow boundary condition. Could this be a bug?(epoch-4.17.16)

Status-Mirror commented 1 year ago

Hi jcyu96,

Apologies for the delay in getting back to you. Do you still have the input deck which caused this issue?

Cheers, Stuart

Status-Mirror commented 1 year ago

I'm going to mark this as closed, but feel free to add another response later on if you're still seeing potential bugs.

ktangtar commented 1 year ago

Hi,

I have been getting this same problem recently. I'm hoping @jcyu96 figured it out or some help from @Status-Mirror.
For any given particle id, some of the outputs have data for it, and then others do not (see figure below, dashed lines represent parts of the trajectory I am missing data, from the id_freq output files). Checking the size of my output files, sometimes the size is larger than the pervious output file (should only be equal or smaller as electrons are lost to the boundaries).

Thank you, any help will be much appreciated.

In my input deck, I have these lines (epoch-4.17.16):

begin:subset name = E_sub include_species:Electron persist_start_time = 0 random_fraction = 0.0001 end:subset

begin:subset name = E_sub_field include_species:Electron

persist_start_time = 0

random_fraction = 0.01 end:subset

begin:subset name = E_sub_full include_species:Electron persist_start_time = 0 random_fraction = 1.0 end:subset

begin:output name = id file_prefix = id_freq dt_snapshot = T_snap_freq

time_start =

time_stop =

id = E_sub
px = E_sub + single py = E_sub + single pz = E_sub + single work_x_total = E_sub + single work_y_total = E_sub +single work_z_total = E_sub + single particles = E_sub + single particle_weight = E_sub + single end:output

begin:output name = id_full file_prefix = id_fulldump dt_snapshot = T_snap_infreq

time_start =

time_stop =

id = E_sub_full px = E_sub_full + single py = E_sub_full + single pz = E_sub_full + single work_x_total = E_sub_full + single work_y_total = E_sub_full +single work_z_total = E_sub_full + single particles = E_sub_full + single particle_weight = E_sub_full + single end:output

begin:output name = id_field file_prefix = id_field dt_snapshot = T_snap/4

time_start =

time_stop =

id = E_sub_field px = E_sub_field + single py = E_sub_field + single pz = E_sub_field + single work_x_total = E_sub_field + single work_y_total = E_sub_field +single work_z_total = E_sub_field + single particles = E_sub_field + single particle_weight = E_sub_field + single end:output

image

Status-Mirror commented 1 year ago

Hi @ktangtar

We didn't get much information on the original issue, so I'm not sure if you're both talking about the same problem. I may understand your issue though.

Perhaps the random_frac key is taking priority over the persist_start_time key. Could you send me your full input deck so I can fully reproduce your simulation? Or maybe just a smaller version of it which has the same error.

Cheers, Stuart

ktangtar commented 1 year ago

@Status-Mirror

Thank you opening up the issue again!
I tried to make a more reduced input deck, and ran into another bug, maybe related. For the "id_freq" outputs (there are three particle tracking outputs, this is the only one that has both persistance and rand_frac), from the "E_sub" subspecies, the file sizes keeps increasing. Initially, the file is 150 KB, but at 500 fs, the file output is 500 MB. Additionally, the previous problem is still there, particles missing parts of their trajectory. image This was run on epoch-4.17.16 with 500 cores for around 10 minutes: image Can you reproduce both errors (input deck below)?

####-----laser beam parameters
  laser_lambda    = 0.81* micron
  laser_intensity = 3.81e21 # a_0 ~= 42.7
  laser_omega     = 2.0 * pi * c / laser_lambda
  T_wave          = laser_lambda / c
  w0              = 4.24 * micron # fwhmi ~ 5 micron
  xf              = 30.0*micron
  rayl            = pi*w0^2/laser_lambda
  sG              = xf/rayl
  wb              = w0*sqrt(1+sG^2)  
####-----plasma parameters
  n_crit     = critical(laser_omega)
  den_e      = 1.0 * n_crit
  t_xmin     = 0.0 * micron
  t_xmax     = 60.0 * micron
  t_ymin     = -10.0 * micron
  t_ymax     = 10.0 * micron
  t_zmin     = -10.0 * micron
  t_zmax     = 10.0 * micron
##-----density rotation
  theta_rot = -20.0*pi/180.0
  xr = (x*cos(theta_rot) - y*sin(theta_rot))
  yr = (x*sin(theta_rot) + y*cos(theta_rot))
##------density preplasma ramp
  x_ramp    = -20.0*micron # ramp length (5% to 100% nominal density), should be negative
####-----output parameters
  T_ave           = 2.0*T_wave
  T_snap          = 100.0*femto
  T_snap_freq     = 1/2.73*T_wave
  T_snap_infreq   = 100.0*femto
  T_snap_dist     = 40.0*femto
end:constant

begin:control
  nx = 100*5
  ny = 24*5
  nz = 24*5

  x_min = -30.0*micron
  x_max =  70.0*micron
  y_min = -12.0*micron
  y_max =  12.0*micron
  z_min = -12.0*micron
  z_max =  12.0*micron

  t_end = 500.1*femto

  dlb_threshold = 0.4
# restart_snapshot = restart_dumps0004.sdf
  use_random_seed = T
end:control

begin:output
  name=restart_dumps
  file_prefix=restart_dumps
  dt_snapshot=100.0e-15
  restartable=T
  dump_first=T
end:output

begin:boundaries
  bc_x_min = simple_laser
  bc_x_max = open
  bc_y_min = open 
  bc_y_max = open
  bc_z_min = open
  bc_z_max = open
end:boundaries

begin:laser
  boundary = x_min 
  lambda = laser_lambda
  intensity_w_cm2 = laser_intensity / (1+sG^2)

  t_profile = gauss(time,50.0e-15,27.0e-15/2/sqrt(loge(2))) #intensity FWHM 27.0fs, intensity w_0 as input
  profile = exp(-(sqrt(y*y+z*z)/wb)^2)

  phase = sG*(sqrt(y*y+z*z)/w0)^2/(1+sG^2)
end:laser

begin:qed
  use_qed=T
  produce_photons=T
  photon_energy_min=10.0*kev
  produce_pairs=F
  photon_dynamics=F
#  qed_table_location=your table location
end:qed

begin:species
  name = Electron
  charge = -1.0
  mass = 1.0
  npart_per_cell = 5
  dump = T
  temp = 0
  number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmax), den_e, 0.0)
  number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmin), (1-0.95/x_ramp*xr) * number_density(Electron), number_density(Electron))
  number_density = if((y gt t_ymin) and (y lt t_ymax), number_density(Electron), 0.0)
  number_density = if((z gt t_zmin) and (z lt t_zmax), number_density(Electron), 0.0)
  identify:electron
end:species

begin:species
  name = Carbon
  charge = 6.0
  mass = 1836.0*12.0
  npart_per_cell = 5
  dump = T
  temp = 0
  immobile = F
  number_density = number_density(Electron) / 6.0
end:species

begin:species
  name=photon
  charge=0.0
  mass=0.0
  npart=0
  dump=T
  identify:photon
end:species

begin:subset
  name = E_sub
  include_species:Electron
  persist_start_time = 0
  random_fraction = 0.0001
end:subset

begin:subset
  name = E_sub_field
  include_species:Electron
# persist_start_time = 0
  random_fraction = 0.01
end:subset

begin:subset
  name = E_sub_full
  include_species:Electron
  persist_start_time = 0
  random_fraction = 1.0
end:subset

begin:output
  name = id
  file_prefix = id_freq
  dt_snapshot = T_snap_freq
 #time_start = 
 #time_stop  = 
  id = E_sub  
  px = E_sub + single
  py = E_sub + single
  pz = E_sub + single
  work_x_total = E_sub + single
  work_y_total = E_sub +single
  work_z_total = E_sub + single
  particles = E_sub + single
  particle_weight = E_sub + single
end:output

begin:output
  name = id_full
  file_prefix = id_fulldump
  dt_snapshot = T_snap_infreq
 #time_start = 
 #time_stop  = 
  id = E_sub_full 
  px = E_sub_full + single
  py = E_sub_full + single
  pz = E_sub_full + single
  work_x_total = E_sub_full + single
  work_y_total = E_sub_full +single
  work_z_total = E_sub_full + single
  particles = E_sub_full + single
  particle_weight = E_sub_full + single
end:output

begin:output
  name = id_field
  file_prefix = id_field
  dt_snapshot = T_snap/4
 #time_start = 
 #time_stop  = 
  id = E_sub_field 
  px = E_sub_field + single
  py = E_sub_field + single
  pz = E_sub_field + single
  work_x_total = E_sub_field + single
  work_y_total = E_sub_field +single
  work_z_total = E_sub_field + single
  particles = E_sub_field + single
  particle_weight = E_sub_field + single
end:output

begin:subset
  name = ph_sub
  include_species:photon
# persist_start_time = 0
  random_fraction = 0.001
end:subset

begin:output
  name = id_photon
  file_prefix = id_freq_photon
  dt_snapshot = T_snap_freq
 #time_start = 
 #time_stop  = 
  particle_grid=ph_sub+single
  gamma=ph_sub+single
  px=ph_sub+single
  py=ph_sub+single
  pz=ph_sub+single
  particle_weight=ph_sub+single
end:output

begin:subset
name=Only_Ph
  include_species:photon
#  random_fraction=1.0
end:subset

begin:output
  name=PhotonDetail
  file_prefix=PhotonDetail
  dt_snapshot=T_snap
  particle_grid=Only_Ph+single
  gamma=Only_Ph+single
  px=Only_Ph+single
  py=Only_Ph+single
  pz=Only_Ph+single
  particle_weight=Only_Ph+single
end:output

Regards, Kavin Tangtartharakul

Status-Mirror commented 1 year ago

Hi Kavin,

I don't currently have access to 500 cores for debugging, so I dropped the $z$ dimension and I ran your input deck in epoch2d. In this simulation, the size of the id_freq input only changed twice - both times when a particle left the simulation window. It also seems that the list of ID values was the same in all output dumps, so I'm not reproducing your bugs in 2D.

I've also found that, while the particle ID values remain the same, the ordering of ID can change. This may be confusing your plotter, depending on how it works.

I have recently merged another bug fix for issue 431, which fixed an error with subsets and particle ID, so maybe this has also fixed your problem? Or maybe this issue is exclusive to epoch3d.

If you can reproduce your issue in 2D using the deck I attach below, then maybe you need to switch to the most recent EPOCH with the subset fix.

Hope this helps, Stuart

begin:constant
####-----laser beam parameters
  laser_lambda    = 0.81* micron
  laser_intensity = 3.81e21 # a_0 ~= 42.7
  laser_omega     = 2.0 * pi * c / laser_lambda
  T_wave          = laser_lambda / c
  w0              = 4.24 * micron # fwhmi ~ 5 micron
  xf              = 30.0*micron
  rayl            = pi*w0^2/laser_lambda
  sG              = xf/rayl
  wb              = w0*sqrt(1+sG^2)  
####-----plasma parameters
  n_crit     = critical(laser_omega)
  den_e      = 1.0 * n_crit
  t_xmin     = 0.0 * micron
  t_xmax     = 60.0 * micron
  t_ymin     = -10.0 * micron
  t_ymax     = 10.0 * micron
  t_zmin     = -10.0 * micron
  t_zmax     = 10.0 * micron
##-----density rotation
  theta_rot = -20.0*pi/180.0
  xr = (x*cos(theta_rot) - y*sin(theta_rot))
  yr = (x*sin(theta_rot) + y*cos(theta_rot))
##------density preplasma ramp
  x_ramp    = -20.0*micron # ramp length (5% to 100% nominal density), should be negative
####-----output parameters
  T_ave           = 2.0*T_wave
  T_snap          = 100.0*femto
  T_snap_freq     = 1/2.73*T_wave
  T_snap_infreq   = 100.0*femto
  T_snap_dist     = 40.0*femto
end:constant

begin:control
  nx = 100*5
  ny = 24*5

  x_min = -30.0*micron
  x_max =  70.0*micron
  y_min = -12.0*micron
  y_max =  12.0*micron

  t_end = 500.1*femto

  dlb_threshold = 0.4
  use_random_seed = T
  stdout_frequency = 10
end:control

begin:output
  name=restart_dumps
  file_prefix=restart_dumps
  dt_snapshot=100.0e-15
  restartable=T
  dump_first=T
end:output

begin:boundaries
  bc_x_min = simple_laser
  bc_x_max = open
  bc_y_min = open 
  bc_y_max = open
end:boundaries

begin:laser
  boundary = x_min 
  lambda = laser_lambda
  intensity_w_cm2 = laser_intensity / (1+sG^2)

  t_profile = gauss(time,50.0e-15,27.0e-15/2/sqrt(loge(2))) #intensity FWHM 27.0fs, intensity w_0 as input
  profile = exp(-(sqrt(y*y)/wb)^2)

  phase = sG*(sqrt(y*y)/w0)^2/(1+sG^2)
end:laser

begin:qed
  use_qed=T
  produce_photons=T
  photon_energy_min=10.0*kev
  produce_pairs=F
  photon_dynamics=F
#  qed_table_location=your table location
end:qed

begin:species
  name = Electron
  charge = -1.0
  mass = 1.0
  npart_per_cell = 5
  dump = T
  temp = 0
  number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmax), den_e, 0.0)
  number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmin), (1-0.95/x_ramp*xr) * number_density(Electron), number_density(Electron))
  number_density = if((y gt t_ymin) and (y lt t_ymax), number_density(Electron), 0.0)
  identify:electron
end:species

begin:species
  name = Carbon
  charge = 6.0
  mass = 1836.0*12.0
  npart_per_cell = 5
  dump = T
  temp = 0
  immobile = F
  number_density = number_density(Electron) / 6.0
end:species

begin:species
  name=photon
  charge=0.0
  mass=0.0
  npart=0
  dump=T
  identify:photon
end:species

begin:subset
  name = E_sub
  include_species:Electron
  persist_start_time = 0
  random_fraction = 0.0001
end:subset

begin:subset
  name = E_sub_field
  include_species:Electron
# persist_start_time = 0
  random_fraction = 0.01
end:subset

begin:subset
  name = E_sub_full
  include_species:Electron
  persist_start_time = 0
  random_fraction = 1.0
end:subset

begin:output
  name = id
  file_prefix = id_freq
  dt_snapshot = T_snap_freq
 #time_start = 
 #time_stop  = 
  id = E_sub  
  px = E_sub + single
  py = E_sub + single
  pz = E_sub + single
  work_x_total = E_sub + single
  work_y_total = E_sub +single
  work_z_total = E_sub + single
  particles = E_sub + single
  particle_weight = E_sub + single
end:output

begin:output
  name = id_full
  file_prefix = id_fulldump
  dt_snapshot = T_snap_infreq
 #time_start = 
 #time_stop  = 
  id = E_sub_full 
  px = E_sub_full + single
  py = E_sub_full + single
  pz = E_sub_full + single
  work_x_total = E_sub_full + single
  work_y_total = E_sub_full +single
  work_z_total = E_sub_full + single
  particles = E_sub_full + single
  particle_weight = E_sub_full + single
end:output

begin:output
  name = id_field
  file_prefix = id_field
  dt_snapshot = T_snap/4
 #time_start = 
 #time_stop  = 
  id = E_sub_field 
  px = E_sub_field + single
  py = E_sub_field + single
  pz = E_sub_field + single
  work_x_total = E_sub_field + single
  work_y_total = E_sub_field +single
  work_z_total = E_sub_field + single
  particles = E_sub_field + single
  particle_weight = E_sub_field + single
end:output

begin:subset
  name = ph_sub
  include_species:photon
# persist_start_time = 0
  random_fraction = 0.001
end:subset

begin:output
  name = id_photon
  file_prefix = id_freq_photon
  dt_snapshot = T_snap_freq
 #time_start = 
 #time_stop  = 
  particle_grid=ph_sub+single
  gamma=ph_sub+single
  px=ph_sub+single
  py=ph_sub+single
  pz=ph_sub+single
  particle_weight=ph_sub+single
end:output

begin:subset
name=Only_Ph
  include_species:photon
#  random_fraction=1.0
end:subset

begin:output
  name=PhotonDetail
  file_prefix=PhotonDetail
  dt_snapshot=T_snap
  particle_grid=Only_Ph+single
  gamma=Only_Ph+single
  px=Only_Ph+single
  py=Only_Ph+single
  pz=Only_Ph+single
  particle_weight=Only_Ph+single
end:output
ktangtar commented 1 year ago

@Status-Mirror

Hi,

Thank you for looking into this. I have rerun it in 2d, and reproduced all the same bugs I had before. I don't think it has to do with the unordered ids, my code handles that and has worked well in the past for other input decks. Also, my file sizes are strange as described earlier. iirc, the other input decks didn't have so many subsets, so maybe that is the direction to look.

Is it epoch-4.19.0 has the subset fix? I am interested in seeing if that works.

Regards, Kavin Tangtartharakul

TomGoffrey commented 1 year ago

I'd always recommend using the most recent version of the code.

If using the latest version of EPOCH doesn't fix your issue, given we're currently unable to reproduce your error, could you provide some minimal code which demonstrates the problem you're having?

Status-Mirror commented 1 year ago

Yeah I agree with Tom, better to use the most recent one.

I know that we have a link to the epoch-4.19.0 version release, but the subset fix is in the code which is currently stored on the main branch at this URL: https://github.com/Warwick-Plasma/epoch

Note that we're currently experiencing an issue with the version-number displayed on start-up - the most recent EPOCH may not say it's running at EPOCH 4.19.

ktangtar commented 1 year ago

Hmm, using the most recent main epoch with the subset fix did not fix the problems for me. I was using your input deck in 2d. I did check, with only "rand_frac" and no "persist", the output was normal.
I tried on the lonestar6 and stampede2 supercomputers on tacc; both had the bug.
I also tried only having the id_freq output, the one that has both random frac and persist for the electrons. The other outputs were commented out. The output now is normal, no bug.

TomGoffrey commented 1 year ago

Could you please post a short piece of code that you're using to check the validity of the output? Without this we are unlikely to be able to make much progress on the issue.

ktangtar commented 1 year ago

Could you please post a short piece of code that you're using to check the validity of the output? Without this we are unlikely to be able to make much progress on the issue.

Sadly, my post recent post is using slightly modified versions of the input deck that Stuart posted above (https://github.com/Warwick-Plasma/epoch/issues/361#issuecomment-1440129813). He was unable to recreate my problem.

Status-Mirror commented 1 year ago

I am now able to reproduce your bug. I think my original test deck removed too much physics, and I've found differences between the deck I tested, and the deck I attached. I have produced a minimal input deck which can reproduce the problem, which I include at the end of this report. Here, we only use the -DPARTICLE_ID pre-compiler flag.

In small0000.sdf, there are only 22 particles present. As a persistant subset, we would expect this subset to only use these 22 particles, and this is seen up to small0009.sdf. In small0010.sdf, there are 238 entries in ID. Of these ID entries, 48 are 0, one is -1, and six have numerically high ID values (like 544240896, in a simulation with 210000 electrons).

The problem does not occur when E_sub_full is removed from the simulation. The problem also does not occur when we remove the persist_start_time flag from E_sub_full.

No issues occur in the full SDF dumps. Here, full0000.sdf contains all 210000 electrons, while the final full0002.sdf contains 204356 ID values, all vailid particles.

Current idea: if multiple persistent subsets describe the same species, and a particle from one subset leaves the simulation, then any persistent subset which doesn't use this particle goes numerically unstable.

begin:control
  nx = 100*5
  ny = 24*5
  x_min = -30.0*micron
  x_max =  70.0*micron
  y_min = -12.0*micron
  y_max =  12.0*micron
  t_end = 120.1*femto
  dlb_threshold = 0.4
  use_random_seed = T
  stdout_frequency = 10
end:control

begin:boundaries
  bc_x_min = simple_laser
  bc_x_max = open
  bc_y_min = open 
  bc_y_max = open
end:boundaries

begin:laser
  boundary = x_min 
  lambda = 0.81e-6
  intensity_w_cm2 = 3.8e21
  t_profile = 1
  profile = 1
  phase = 0 
end:laser

begin:species
  name = Electron
  npart_per_cell = 5
  dump = T
  temp = 0
  number_density = if(x gt 0 and x lt 10, 1.0e25, 0.0)
  identify:electron
end:species

begin:species
  name = Carbon
  charge = 6.0
  mass = 1836.0*12.0
  npart_per_cell = 5
  dump = T
  temp = 0
  number_density = number_density(Electron) / 6.0
end:species

begin:subset
  name = E_sub
  include_species:Electron
  persist_start_time = 0
  random_fraction = 0.0001
end:subset

begin:subset
  name = E_sub_full
  include_species:Electron
  persist_start_time = 0
  random_fraction = 1.0
end:subset

begin:output
  name = small
  file_prefix = small
  dt_snapshot = 10.0e-15
  id = E_sub  
end:output

begin:output
  name = full
  file_prefix = full
  dt_snapshot = 100.0e-15
  id = E_sub_full 
end:output
ktangtar commented 1 year ago

@Status-Mirror Wow, nice progress!
I just tested your theory on my input deck, and it seems to be correct. Originally, my subsets were image I commented out the "persist" line in the "full" subset (which I wouldn't have thought mattered because rand_frac=1), image

The result is that my outputs are normal now.

Status-Mirror commented 1 year ago

I have investigated the issue further, and I believe I have a fix. This was a complicated issue to spot, as the EPOCH subset feature has limited documentation. I will explain what the code is trying to do, how it's making the mistake, how it relates to our deck, and a fix which solves the issue with your deck. I have set up a merge request for this fix, and it may be in the main branch soon.

Detailed error behaviour

I made a reduced version of my last input deck, with dlb_threshold and use_random_seed removed.

I found that the bug was not present when running on a single processor, and I was able to reproduce the bug when running on 2 processors. The first error appeared on step 211, when particle 105002 moved from a rank 1 cell to a rank 0 cell. By inspecting the count parameter of the particle_id_hash data-type, we were able to see that for the E_sub subset, rank 1 still had 10 subset particles (no change), but count in rank 0 increased by one particle during the transfer. As particle 105002 was not a member of E_sub, the rank 1 behaviour was correct, but the rank 0 behaviour was wrong - a particle was incorrectly added to the subset.

Subset maps

During particle processor transfer, pack_particle calls id_registry%map(a_particle), which returns a key detailing which subsets the particle belongs to. This is needed, as each rank only knows which of its local particles are in subsets - the particle must be added to local subsets when moving between ranks. The key uses bitwise functions IAND and ISHIFT, which I will explain here:

I believe the bug is present when you run the procedure in reverse. The subroutine pidr_add_with_map reads the integer map, and logs the particles in the relevant subsets. Once again, we loop over all subsets with index i_hash. To test whether a particle is in the species, EPOCH used IAND(hashmap, 1), which only returns 1 if the right-most binary digit of hashmap is also 1. The issue is, the right-most digit refers to the last subset with the highest i_hash, but in this subroutine it has been associated with i_hash=1. EPOCH appears to be loading the subsets in reverse.

Debugging

In our example, particle 105002 was not present in E_sub (i_hash = 1), but it was present in E_sub_full (i_hash = 2), which made hashmap 01 in binary, or 1 as an integer. In pidr_add_with_map for i_hash=1, we have IAND(1,1)=1 and so a particle was added to E_sub in rank 0. When we perform the reverse bitwise shift, 01 goes to 0, and IAND(0,1)=0, so the particle was not added to E_sub_full in rank 0 (also wrong, as this subset is supposed to contain all particles).

In earlier messages, I assumed the E_sub_full depletion meant particles were escaping the simulation, but this was not true. This also explains why the error vanishes when we remove E_sub_full - reversing the loading order does not matter if there is only 1 subset present.

Luckily, this requires only a simple fix. We can use the same test as before, but if successful, we must add a particle to subset sz+1-i_hash, instead of i_hash, where sz is the number of subsets present. When this fix is applied, the file-size of the small SDF files remains fixed, as no E_sub particles leave the simulation window.

ktangtar commented 1 year ago

Thank you for figuring all this out and fixing the bug, as well as the detailed explanation!