Open jcyu96 opened 2 years ago
Hi jcyu96,
Apologies for the delay in getting back to you. Do you still have the input deck which caused this issue?
Cheers, Stuart
I'm going to mark this as closed, but feel free to add another response later on if you're still seeing potential bugs.
Hi,
I have been getting this same problem recently. I'm hoping @jcyu96 figured it out or some help from @Status-Mirror.
For any given particle id, some of the outputs have data for it, and then others do not (see figure below, dashed lines represent parts of the trajectory I am missing data, from the id_freq output files). Checking the size of my output files, sometimes the size is larger than the pervious output file (should only be equal or smaller as electrons are lost to the boundaries).
Thank you, any help will be much appreciated.
In my input deck, I have these lines (epoch-4.17.16):
begin:subset name = E_sub include_species:Electron persist_start_time = 0 random_fraction = 0.0001 end:subset
begin:subset name = E_sub_field include_species:Electron
random_fraction = 0.01 end:subset
begin:subset name = E_sub_full include_species:Electron persist_start_time = 0 random_fraction = 1.0 end:subset
begin:output name = id file_prefix = id_freq dt_snapshot = T_snap_freq
id = E_sub
px = E_sub + single
py = E_sub + single
pz = E_sub + single
work_x_total = E_sub + single
work_y_total = E_sub +single
work_z_total = E_sub + single
particles = E_sub + single
particle_weight = E_sub + single
end:output
begin:output name = id_full file_prefix = id_fulldump dt_snapshot = T_snap_infreq
id = E_sub_full px = E_sub_full + single py = E_sub_full + single pz = E_sub_full + single work_x_total = E_sub_full + single work_y_total = E_sub_full +single work_z_total = E_sub_full + single particles = E_sub_full + single particle_weight = E_sub_full + single end:output
begin:output name = id_field file_prefix = id_field dt_snapshot = T_snap/4
id = E_sub_field px = E_sub_field + single py = E_sub_field + single pz = E_sub_field + single work_x_total = E_sub_field + single work_y_total = E_sub_field +single work_z_total = E_sub_field + single particles = E_sub_field + single particle_weight = E_sub_field + single end:output
Hi @ktangtar
We didn't get much information on the original issue, so I'm not sure if you're both talking about the same problem. I may understand your issue though.
Perhaps the random_frac
key is taking priority over the persist_start_time
key. Could you send me your full input deck so I can fully reproduce your simulation? Or maybe just a smaller version of it which has the same error.
Cheers, Stuart
@Status-Mirror
Thank you opening up the issue again!
I tried to make a more reduced input deck, and ran into another bug, maybe related. For the "id_freq" outputs (there are three particle tracking outputs, this is the only one that has both persistance and rand_frac), from the "E_sub" subspecies, the file sizes keeps increasing. Initially, the file is 150 KB, but at 500 fs, the file output is 500 MB. Additionally, the previous problem is still there, particles missing parts of their trajectory.
This was run on epoch-4.17.16 with 500 cores for around 10 minutes:
Can you reproduce both errors (input deck below)?
####-----laser beam parameters
laser_lambda = 0.81* micron
laser_intensity = 3.81e21 # a_0 ~= 42.7
laser_omega = 2.0 * pi * c / laser_lambda
T_wave = laser_lambda / c
w0 = 4.24 * micron # fwhmi ~ 5 micron
xf = 30.0*micron
rayl = pi*w0^2/laser_lambda
sG = xf/rayl
wb = w0*sqrt(1+sG^2)
####-----plasma parameters
n_crit = critical(laser_omega)
den_e = 1.0 * n_crit
t_xmin = 0.0 * micron
t_xmax = 60.0 * micron
t_ymin = -10.0 * micron
t_ymax = 10.0 * micron
t_zmin = -10.0 * micron
t_zmax = 10.0 * micron
##-----density rotation
theta_rot = -20.0*pi/180.0
xr = (x*cos(theta_rot) - y*sin(theta_rot))
yr = (x*sin(theta_rot) + y*cos(theta_rot))
##------density preplasma ramp
x_ramp = -20.0*micron # ramp length (5% to 100% nominal density), should be negative
####-----output parameters
T_ave = 2.0*T_wave
T_snap = 100.0*femto
T_snap_freq = 1/2.73*T_wave
T_snap_infreq = 100.0*femto
T_snap_dist = 40.0*femto
end:constant
begin:control
nx = 100*5
ny = 24*5
nz = 24*5
x_min = -30.0*micron
x_max = 70.0*micron
y_min = -12.0*micron
y_max = 12.0*micron
z_min = -12.0*micron
z_max = 12.0*micron
t_end = 500.1*femto
dlb_threshold = 0.4
# restart_snapshot = restart_dumps0004.sdf
use_random_seed = T
end:control
begin:output
name=restart_dumps
file_prefix=restart_dumps
dt_snapshot=100.0e-15
restartable=T
dump_first=T
end:output
begin:boundaries
bc_x_min = simple_laser
bc_x_max = open
bc_y_min = open
bc_y_max = open
bc_z_min = open
bc_z_max = open
end:boundaries
begin:laser
boundary = x_min
lambda = laser_lambda
intensity_w_cm2 = laser_intensity / (1+sG^2)
t_profile = gauss(time,50.0e-15,27.0e-15/2/sqrt(loge(2))) #intensity FWHM 27.0fs, intensity w_0 as input
profile = exp(-(sqrt(y*y+z*z)/wb)^2)
phase = sG*(sqrt(y*y+z*z)/w0)^2/(1+sG^2)
end:laser
begin:qed
use_qed=T
produce_photons=T
photon_energy_min=10.0*kev
produce_pairs=F
photon_dynamics=F
# qed_table_location=your table location
end:qed
begin:species
name = Electron
charge = -1.0
mass = 1.0
npart_per_cell = 5
dump = T
temp = 0
number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmax), den_e, 0.0)
number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmin), (1-0.95/x_ramp*xr) * number_density(Electron), number_density(Electron))
number_density = if((y gt t_ymin) and (y lt t_ymax), number_density(Electron), 0.0)
number_density = if((z gt t_zmin) and (z lt t_zmax), number_density(Electron), 0.0)
identify:electron
end:species
begin:species
name = Carbon
charge = 6.0
mass = 1836.0*12.0
npart_per_cell = 5
dump = T
temp = 0
immobile = F
number_density = number_density(Electron) / 6.0
end:species
begin:species
name=photon
charge=0.0
mass=0.0
npart=0
dump=T
identify:photon
end:species
begin:subset
name = E_sub
include_species:Electron
persist_start_time = 0
random_fraction = 0.0001
end:subset
begin:subset
name = E_sub_field
include_species:Electron
# persist_start_time = 0
random_fraction = 0.01
end:subset
begin:subset
name = E_sub_full
include_species:Electron
persist_start_time = 0
random_fraction = 1.0
end:subset
begin:output
name = id
file_prefix = id_freq
dt_snapshot = T_snap_freq
#time_start =
#time_stop =
id = E_sub
px = E_sub + single
py = E_sub + single
pz = E_sub + single
work_x_total = E_sub + single
work_y_total = E_sub +single
work_z_total = E_sub + single
particles = E_sub + single
particle_weight = E_sub + single
end:output
begin:output
name = id_full
file_prefix = id_fulldump
dt_snapshot = T_snap_infreq
#time_start =
#time_stop =
id = E_sub_full
px = E_sub_full + single
py = E_sub_full + single
pz = E_sub_full + single
work_x_total = E_sub_full + single
work_y_total = E_sub_full +single
work_z_total = E_sub_full + single
particles = E_sub_full + single
particle_weight = E_sub_full + single
end:output
begin:output
name = id_field
file_prefix = id_field
dt_snapshot = T_snap/4
#time_start =
#time_stop =
id = E_sub_field
px = E_sub_field + single
py = E_sub_field + single
pz = E_sub_field + single
work_x_total = E_sub_field + single
work_y_total = E_sub_field +single
work_z_total = E_sub_field + single
particles = E_sub_field + single
particle_weight = E_sub_field + single
end:output
begin:subset
name = ph_sub
include_species:photon
# persist_start_time = 0
random_fraction = 0.001
end:subset
begin:output
name = id_photon
file_prefix = id_freq_photon
dt_snapshot = T_snap_freq
#time_start =
#time_stop =
particle_grid=ph_sub+single
gamma=ph_sub+single
px=ph_sub+single
py=ph_sub+single
pz=ph_sub+single
particle_weight=ph_sub+single
end:output
begin:subset
name=Only_Ph
include_species:photon
# random_fraction=1.0
end:subset
begin:output
name=PhotonDetail
file_prefix=PhotonDetail
dt_snapshot=T_snap
particle_grid=Only_Ph+single
gamma=Only_Ph+single
px=Only_Ph+single
py=Only_Ph+single
pz=Only_Ph+single
particle_weight=Only_Ph+single
end:output
Regards, Kavin Tangtartharakul
Hi Kavin,
I don't currently have access to 500 cores for debugging, so I dropped the $z$ dimension and I ran your input deck in epoch2d. In this simulation, the size of the id_freq input only changed twice - both times when a particle left the simulation window. It also seems that the list of ID values was the same in all output dumps, so I'm not reproducing your bugs in 2D.
I've also found that, while the particle ID values remain the same, the ordering of ID can change. This may be confusing your plotter, depending on how it works.
I have recently merged another bug fix for issue 431, which fixed an error with subsets and particle ID, so maybe this has also fixed your problem? Or maybe this issue is exclusive to epoch3d
.
If you can reproduce your issue in 2D using the deck I attach below, then maybe you need to switch to the most recent EPOCH with the subset fix.
Hope this helps, Stuart
begin:constant
####-----laser beam parameters
laser_lambda = 0.81* micron
laser_intensity = 3.81e21 # a_0 ~= 42.7
laser_omega = 2.0 * pi * c / laser_lambda
T_wave = laser_lambda / c
w0 = 4.24 * micron # fwhmi ~ 5 micron
xf = 30.0*micron
rayl = pi*w0^2/laser_lambda
sG = xf/rayl
wb = w0*sqrt(1+sG^2)
####-----plasma parameters
n_crit = critical(laser_omega)
den_e = 1.0 * n_crit
t_xmin = 0.0 * micron
t_xmax = 60.0 * micron
t_ymin = -10.0 * micron
t_ymax = 10.0 * micron
t_zmin = -10.0 * micron
t_zmax = 10.0 * micron
##-----density rotation
theta_rot = -20.0*pi/180.0
xr = (x*cos(theta_rot) - y*sin(theta_rot))
yr = (x*sin(theta_rot) + y*cos(theta_rot))
##------density preplasma ramp
x_ramp = -20.0*micron # ramp length (5% to 100% nominal density), should be negative
####-----output parameters
T_ave = 2.0*T_wave
T_snap = 100.0*femto
T_snap_freq = 1/2.73*T_wave
T_snap_infreq = 100.0*femto
T_snap_dist = 40.0*femto
end:constant
begin:control
nx = 100*5
ny = 24*5
x_min = -30.0*micron
x_max = 70.0*micron
y_min = -12.0*micron
y_max = 12.0*micron
t_end = 500.1*femto
dlb_threshold = 0.4
use_random_seed = T
stdout_frequency = 10
end:control
begin:output
name=restart_dumps
file_prefix=restart_dumps
dt_snapshot=100.0e-15
restartable=T
dump_first=T
end:output
begin:boundaries
bc_x_min = simple_laser
bc_x_max = open
bc_y_min = open
bc_y_max = open
end:boundaries
begin:laser
boundary = x_min
lambda = laser_lambda
intensity_w_cm2 = laser_intensity / (1+sG^2)
t_profile = gauss(time,50.0e-15,27.0e-15/2/sqrt(loge(2))) #intensity FWHM 27.0fs, intensity w_0 as input
profile = exp(-(sqrt(y*y)/wb)^2)
phase = sG*(sqrt(y*y)/w0)^2/(1+sG^2)
end:laser
begin:qed
use_qed=T
produce_photons=T
photon_energy_min=10.0*kev
produce_pairs=F
photon_dynamics=F
# qed_table_location=your table location
end:qed
begin:species
name = Electron
charge = -1.0
mass = 1.0
npart_per_cell = 5
dump = T
temp = 0
number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmax), den_e, 0.0)
number_density = if((xr gt (t_xmin + x_ramp)) and (xr lt t_xmin), (1-0.95/x_ramp*xr) * number_density(Electron), number_density(Electron))
number_density = if((y gt t_ymin) and (y lt t_ymax), number_density(Electron), 0.0)
identify:electron
end:species
begin:species
name = Carbon
charge = 6.0
mass = 1836.0*12.0
npart_per_cell = 5
dump = T
temp = 0
immobile = F
number_density = number_density(Electron) / 6.0
end:species
begin:species
name=photon
charge=0.0
mass=0.0
npart=0
dump=T
identify:photon
end:species
begin:subset
name = E_sub
include_species:Electron
persist_start_time = 0
random_fraction = 0.0001
end:subset
begin:subset
name = E_sub_field
include_species:Electron
# persist_start_time = 0
random_fraction = 0.01
end:subset
begin:subset
name = E_sub_full
include_species:Electron
persist_start_time = 0
random_fraction = 1.0
end:subset
begin:output
name = id
file_prefix = id_freq
dt_snapshot = T_snap_freq
#time_start =
#time_stop =
id = E_sub
px = E_sub + single
py = E_sub + single
pz = E_sub + single
work_x_total = E_sub + single
work_y_total = E_sub +single
work_z_total = E_sub + single
particles = E_sub + single
particle_weight = E_sub + single
end:output
begin:output
name = id_full
file_prefix = id_fulldump
dt_snapshot = T_snap_infreq
#time_start =
#time_stop =
id = E_sub_full
px = E_sub_full + single
py = E_sub_full + single
pz = E_sub_full + single
work_x_total = E_sub_full + single
work_y_total = E_sub_full +single
work_z_total = E_sub_full + single
particles = E_sub_full + single
particle_weight = E_sub_full + single
end:output
begin:output
name = id_field
file_prefix = id_field
dt_snapshot = T_snap/4
#time_start =
#time_stop =
id = E_sub_field
px = E_sub_field + single
py = E_sub_field + single
pz = E_sub_field + single
work_x_total = E_sub_field + single
work_y_total = E_sub_field +single
work_z_total = E_sub_field + single
particles = E_sub_field + single
particle_weight = E_sub_field + single
end:output
begin:subset
name = ph_sub
include_species:photon
# persist_start_time = 0
random_fraction = 0.001
end:subset
begin:output
name = id_photon
file_prefix = id_freq_photon
dt_snapshot = T_snap_freq
#time_start =
#time_stop =
particle_grid=ph_sub+single
gamma=ph_sub+single
px=ph_sub+single
py=ph_sub+single
pz=ph_sub+single
particle_weight=ph_sub+single
end:output
begin:subset
name=Only_Ph
include_species:photon
# random_fraction=1.0
end:subset
begin:output
name=PhotonDetail
file_prefix=PhotonDetail
dt_snapshot=T_snap
particle_grid=Only_Ph+single
gamma=Only_Ph+single
px=Only_Ph+single
py=Only_Ph+single
pz=Only_Ph+single
particle_weight=Only_Ph+single
end:output
@Status-Mirror
Hi,
Thank you for looking into this. I have rerun it in 2d, and reproduced all the same bugs I had before. I don't think it has to do with the unordered ids, my code handles that and has worked well in the past for other input decks. Also, my file sizes are strange as described earlier. iirc, the other input decks didn't have so many subsets, so maybe that is the direction to look.
Is it epoch-4.19.0 has the subset fix? I am interested in seeing if that works.
Regards, Kavin Tangtartharakul
I'd always recommend using the most recent version of the code.
If using the latest version of EPOCH doesn't fix your issue, given we're currently unable to reproduce your error, could you provide some minimal code which demonstrates the problem you're having?
Yeah I agree with Tom, better to use the most recent one.
I know that we have a link to the epoch-4.19.0 version release, but the subset fix is in the code which is currently stored on the main branch at this URL: https://github.com/Warwick-Plasma/epoch
Note that we're currently experiencing an issue with the version-number displayed on start-up - the most recent EPOCH may not say it's running at EPOCH 4.19.
Hmm, using the most recent main epoch with the subset fix did not fix the problems for me. I was using your input deck in 2d.
I did check, with only "rand_frac" and no "persist", the output was normal.
I tried on the lonestar6 and stampede2 supercomputers on tacc; both had the bug.
I also tried only having the id_freq output, the one that has both random frac and persist for the electrons. The other outputs were commented out. The output now is normal, no bug.
Could you please post a short piece of code that you're using to check the validity of the output? Without this we are unlikely to be able to make much progress on the issue.
Could you please post a short piece of code that you're using to check the validity of the output? Without this we are unlikely to be able to make much progress on the issue.
Sadly, my post recent post is using slightly modified versions of the input deck that Stuart posted above (https://github.com/Warwick-Plasma/epoch/issues/361#issuecomment-1440129813). He was unable to recreate my problem.
I am now able to reproduce your bug. I think my original test deck removed too much physics, and I've found differences between the deck I tested, and the deck I attached. I have produced a minimal input deck which can reproduce the problem, which I include at the end of this report. Here, we only use the -DPARTICLE_ID
pre-compiler flag.
In small0000.sdf
, there are only 22 particles present. As a persistant subset, we would expect this subset to only use these 22 particles, and this is seen up to small0009.sdf
. In small0010.sdf
, there are 238 entries in ID. Of these ID entries, 48 are 0, one is -1, and six have numerically high ID values (like 544240896, in a simulation with 210000 electrons).
The problem does not occur when E_sub_full is removed from the simulation. The problem also does not occur when we remove the persist_start_time
flag from E_sub_full.
No issues occur in the full SDF dumps. Here, full0000.sdf
contains all 210000 electrons, while the final full0002.sdf
contains 204356 ID values, all vailid particles.
Current idea: if multiple persistent subsets describe the same species, and a particle from one subset leaves the simulation, then any persistent subset which doesn't use this particle goes numerically unstable.
begin:control
nx = 100*5
ny = 24*5
x_min = -30.0*micron
x_max = 70.0*micron
y_min = -12.0*micron
y_max = 12.0*micron
t_end = 120.1*femto
dlb_threshold = 0.4
use_random_seed = T
stdout_frequency = 10
end:control
begin:boundaries
bc_x_min = simple_laser
bc_x_max = open
bc_y_min = open
bc_y_max = open
end:boundaries
begin:laser
boundary = x_min
lambda = 0.81e-6
intensity_w_cm2 = 3.8e21
t_profile = 1
profile = 1
phase = 0
end:laser
begin:species
name = Electron
npart_per_cell = 5
dump = T
temp = 0
number_density = if(x gt 0 and x lt 10, 1.0e25, 0.0)
identify:electron
end:species
begin:species
name = Carbon
charge = 6.0
mass = 1836.0*12.0
npart_per_cell = 5
dump = T
temp = 0
number_density = number_density(Electron) / 6.0
end:species
begin:subset
name = E_sub
include_species:Electron
persist_start_time = 0
random_fraction = 0.0001
end:subset
begin:subset
name = E_sub_full
include_species:Electron
persist_start_time = 0
random_fraction = 1.0
end:subset
begin:output
name = small
file_prefix = small
dt_snapshot = 10.0e-15
id = E_sub
end:output
begin:output
name = full
file_prefix = full
dt_snapshot = 100.0e-15
id = E_sub_full
end:output
@Status-Mirror
Wow, nice progress!
I just tested your theory on my input deck, and it seems to be correct.
Originally, my subsets were
I commented out the "persist" line in the "full" subset (which I wouldn't have thought mattered because rand_frac=1),
The result is that my outputs are normal now.
I have investigated the issue further, and I believe I have a fix. This was a complicated issue to spot, as the EPOCH subset feature has limited documentation. I will explain what the code is trying to do, how it's making the mistake, how it relates to our deck, and a fix which solves the issue with your deck. I have set up a merge request for this fix, and it may be in the main branch soon.
I made a reduced version of my last input deck, with dlb_threshold
and use_random_seed
removed.
I found that the bug was not present when running on a single processor, and I was able to reproduce the bug when running on 2 processors. The first error appeared on step 211, when particle 105002 moved from a rank 1 cell to a rank 0 cell. By inspecting the count
parameter of the particle_id_hash
data-type, we were able to see that for the E_sub
subset, rank 1 still had 10 subset particles (no change), but count
in rank 0 increased by one particle during the transfer. As particle 105002 was not a member of E_sub
, the rank 1 behaviour was correct, but the rank 0 behaviour was wrong - a particle was incorrectly added to the subset.
During particle processor transfer, pack_particle
calls id_registry%map(a_particle)
, which returns a key detailing which subsets the particle belongs to. This is needed, as each rank only knows which of its local particles are in subsets - the particle must be added to local subsets when moving between ranks. The key uses bitwise functions IAND and ISHIFT, which I will explain here:
pidr_map
, we declare a hashmap
integer which is initially set to 0i_hash
i_hash
, we determine whether the particle is present in the subset. hashmap
, otherwise we move on to the next i_hash
hashmap
is now either 1 or 0, which in binary is also 1 or 0. The code line: hashmap = ISHFT(hashmap, 1_i8)
appends a binary 0 to hashmap
, making 10 or 00 in binary, or 2 or 0 in integer form.i_hash
represents a new subset, so if the particle is present in this second subset, we add a 1 to hashmap
again.hashmap
integers 3, 2, 1, 0.I believe the bug is present when you run the procedure in reverse. The subroutine pidr_add_with_map
reads the integer map, and logs the particles in the relevant subsets. Once again, we loop over all subsets with index i_hash
. To test whether a particle is in the species, EPOCH used IAND(hashmap, 1)
, which only returns 1 if the right-most binary digit of hashmap is also 1. The issue is, the right-most digit refers to the last subset with the highest i_hash
, but in this subroutine it has been associated with i_hash
=1. EPOCH appears to be loading the subsets in reverse.
In our example, particle 105002 was not present in E_sub
(i_hash
= 1), but it was present in E_sub_full
(i_hash
= 2), which made hashmap 01 in binary, or 1 as an integer. In pidr_add_with_map
for i_hash
=1, we have IAND(1,1)=1
and so a particle was added to E_sub
in rank 0. When we perform the reverse bitwise shift, 01 goes to 0, and IAND(0,1)=0
, so the particle was not added to E_sub_full
in rank 0 (also wrong, as this subset is supposed to contain all particles).
In earlier messages, I assumed the E_sub_full
depletion meant particles were escaping the simulation, but this was not true. This also explains why the error vanishes when we remove E_sub_full
- reversing the loading order does not matter if there is only 1 subset present.
Luckily, this requires only a simple fix. We can use the same test as before, but if successful, we must add a particle to subset sz+1-i_hash
, instead of i_hash
, where sz
is the number of subsets present. When this fix is applied, the file-size of the small SDF files remains fixed, as no E_sub
particles leave the simulation window.
Thank you for figuring all this out and fixing the bug, as well as the detailed explanation!
Dear developers, I used subset to select some particles at the beginning, and when I checked the ID of particles at different times, I found that some particles with some certain ID suddenly disappeared at some time and then suddenly appeared again later. I used
simple_outflow
boundary condition. Could this be a bug?(epoch-4.17.16)