Closed KseniaBastrakova closed 3 years ago
After offline disscussion with @KseniaBastrakova @prometeusPi, i using this approach: 1) copy original file from PIConGPU as it worked in #22 2) open the copied (!) file RW, i.e., with python ('r+') 3) Filtering only the wanted particles, e.g., bunch particles in the FWHM of energy
shape
attribute for mass
and charge
to the new particle number
5) close file and make sure that data is sucessfully writtenNow i submitted a jobs as mentioned in #22 . It failed with:
Series.set_software_version is deprecated. Set the version with the second argument of Series.set_software
I == 0
I == 1
I == 2
I == 3
I == 4
I == 5
I == 6
I == 7
I == 8
I == 9
I == 10
I == 11
I == 12
I == 13
I == 14
I == 15
I == 16
I == 17
I == 18
I == 19
I == 20
I == 21
I == 22
I == 23
I == 24
I == 25
I == 26
I == 27
I == 28
I == 29
I == 30
I == 31
I == 32
I == 33
I == 34
I == 35
I == 36
I == 37
I == 38
I == 39
I == 40
I == 41
I == 42
I == 43
I == 44
I == 45
I == 46
Traceback (most recent call last):
File "reduction_main.py", line 822, in <module>
base_reduction_function(args.hdf, args.hdf_re, "random", parameters, args.iteration)
File "reduction_main.py", line 220, in base_reduction_function
reduce_one_iteration(series_hdf, series_hdf_reduction, algorithm, iteration)
File "reduction_main.py", line 206, in reduce_one_iteration
process_iteration_group(algorithm, current_iteration, series_hdf, series_hdf_reduction, reduction_iteration)
File "reduction_main.py", line 240, in process_iteration_group
process_patches_in_group_v2(iteration.particles[name_group], series_hdf,
File "reduction_main.py", line 686, in process_patches_in_group_v2
data, dict_data_indexes, last_idx = get_data(series_hdf, particle_species, weights, idx_start, idx_end)
File "reduction_main.py", line 642, in get_data
if len(absolute_values[0]) == 1:
IndexError: index 0 is out of bounds for axis 0 with size 0
I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?
I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?
yes, I think, you missed "unit_si" or "macroweighted" attribute Could you send me a link to the file? It will be helpful to reproduce
This is the path to the file on taurus
/scratch/ws/1/s6119265-simdata-exchange
where you will find the file simData_filtered_60000.h5. This is the file i got the error with.
I dont know if this is the good file exchange but i hope it works. Do you have a taurus account?
Thanks @alex-koe . Unfortunately neither @KseniaBastrakova nor I have an account there. So could you transfer to Hemera please?
@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/
on hemera. It should be readable. Please copy the file for your usage.
@sbastrakov the file is now on
/bigdata/hplsim/production/LWFA_betatron_decoherence/
on hemera. It should be readable. Please copy the file for your usage.
Thank you for providing the file
I think, i understand your problem:
in filesimData_filtered_60000.h5
the sizes of all datasets:momentum, charge, position, etc ..
is 1705615.
you can check it, for example, by:
f = h5py.File('simData_filtered_60000.h5', 'r')
result = f["data"]["60000"]["particles"]["en_all"]["particleId"]
print(result.shape)
and output is :(1705615,)
.
And your dataset numParticlesOffset
in particle patches doesn't correspond values here from 0 to 1.30344E7. (that much bigger)
f = h5py.File('simData_filtered_60000.h5', 'r')
result = f["data"]["60000"]["particles"]["en_all"]["particlePatches"]['numParticlesOffset'][0:193]
print(result)
and output is:
[ 0 24357 29493 33502 56927 82990 104215 119799
...
13029684 13029684 13029684 13029684 13029684 13032670 13033621 13034400]
numParticlesOffset` has to be in range [0: particle_species_size]. It indicates start of new particles chunk.
So, i can suggest two solutions: 1) i can fix the file, by changing 'numParticlesOffset' from origin to version that has values only from 0 to 1705615 2) i cah add additional parameter "skip particle patch and make new" in redutor
So what may have gone wrong is that when removing the particles except the IDs you wanted to keep, the patch information was not updated correspondingly @alex-koe . Not 100% sure, just seems plausible. I think the easiest is to just remove the patches datasets alltogether then? (They are optional, and the reductor works when they are not present, just does not work when they are present but wrong)
@sbastrakov I agree, the patch information was not updated. Just removing the patches is the easiest approach. I deleted the patches datasets from the file and restarted the job. The result was a segmention fault. Does the script need to be adjusted?
As i thought no, i think @KseniaBastrakova will take a look
@alex-koe i think, i reproduce your problem and fix it.
Could you make
git fetch && git pull
and try reduction once again?
@KseniaBastrakova It is working! Great work. And it is fast: it took only about 2min to run the job. :-)
@sbastrakov @KseniaBastrakova I have only a minor question. When i do read out the position dataset, it looks the same for original and reduced particles and he data set is reduced in size. The weights seems to be ok, i.e., the sum of the weights give the same charge. The average position and spread in position is consistent between original and reduced set.
However, i do not understand the momentum: This is the original set in (y, p_y):
This is the reduced set: The distribution in y looks fine. The distribution in p_y seems to have a different factor.
For reading the momentum, i do the following:
""" label is one of {x,y,z}, species is *en*, ..."""
u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)] # ~ gamma beta m c
w = f['/data/{}/particles/{}/weighting'.format(timestep, species)] #
u_unitSI = u.attrs["unitSI"]
uy = u_unitSI*np.array(u) /np.array(w) /(const.m_e*const.c) ## the normalized momenum gamma beta_i
Do i have to take special care here to get the correct momentum?
Thanks for reporting. It could be that somewhere (not sure on whose side) the macroweighted flag is treated incorrectly.
To extend on that thought: @alex-koe your way of computing momentum is not suitable for general openPMD files, as the momentum attribute could theoretically be for both single- and macroparticle, and there is a special attribute that informs on what way was used. However, I assume you know how it is for your particular file (that momentum is for macroparticles, like in PIConGPU output) and read it accordingly. So I think it is more likely there is something fishy in the reductor @KseniaBastrakova , e.g. maybe the reductor writes it out with the different macroweighted
way?
@sbastrakov so what would be then the correct way? I used pretty used the same way that did give me the correct momentum for all simulations as before and never had an issue. Is the 'reduced' file somehow in a different format?
This is another thing i tried out, the x-ux space with the random reductor: These are about 5001 particles illustrated by a scatter plot.
@alex-koe thank you for details. I think, the problem on my side: i found a bug, that can couse this behavior. I will fix it today.
@alex-koe , now it's work better. Could you make
git fetch && git pull
and try once again?
@KseniaBastrakova I tried it again. It is now different :-D The momentum and (x-ux) looks strange.
@alex-koe . So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce I just tried to build two graphs: before and after redaction by this:
def print_graph():
f = h5py.File(''simData_beta_60000.h5', 'r')
timestep = 60000
species = 'en_all'
m_e = 9.1e-31
c = 1080000000
label = "y"
u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)] # ~ gamma beta m c
w = f['/data/{}/particles/{}/weighting'.format(timestep, species)] #
u_unitSI = u.attrs["unitSI"]
uy = u_unitSI * np.array(u) / np.array(w) / (m_e * c)
plt.scatter(u, uy, marker='x')
plt.xlabel('y [um]')
plt.ylabel("p_y[MeV/e]")
plt.show()
first, simData_beta_60000 before reduction
and after reduction with ratio of deleted particles = 0.1
So, graphs look similar, so i can't reproduce problem
So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce I just tried to build two graphs: before and after redaction by this:
I put the submit script, the jupyter notebook i used for the plots shown and the sim file for in and out to the tar archive for-Ksenia.tar.bz2
on /bigdata/hplsim/production/LWFA_betatron_decoherence/
(on hemera). So you can follow what i did. A major difference to your data is actually, that i used a much larger ratio of deleted particles close to one. I dont know if this could causing something.
I don't think a large ratio should not work. It would certainly make the sampling much worse, but if that's acceptable otherwise should not be an issue.
@alex-koe , could you give me access to thefor-Ksenia.tar.bz2
? I just don't have rights to copy it
@alex-koe , could you give me access to the
for-Ksenia.tar.bz2
? I just don't have rights to copy it
@KseniaBastrakova you have now right access to the file.
@alex-koe, Thank you for scripts and sample. It's really useful
The problem that momentums are "macroweighted" that means, that we store total momentum for all electrons in microparticle. (weigh * one_marticle_momentum)
So, if you delete 0.9, so the start number of macroparticles is 13037395
rest number of macroparticles is 1303739
, so we deleted11733655
macroparticles. And now, we need to redristribute their weight between rest macropartilces, that means
we add about1300
weight to each rest macroparticle.
And, becouse we store momemum asmomentum * weight_of_macroparticle
we get such big momentums
So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor
So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor
I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.
So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor
I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.
I can make additional script to fix it, as a temporary solution
So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor
I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.
I can make additional script to fix it, as a temporary solution
Sorry, i maybe not correctly understand: It is not possible to redistribute the weighting after the reduction process? I would like to have the total physical quantities, ie. energy, momentum, charge, mass, conserved. Such as, that if i have in the original file 100pC of charge and the average momentum is 100MeV/c, then the reduced file has the same charge and average momentum but just the number of macro particle is reduced.
@alex-koe let me reiterate. Of couse, it is possible to remove some macroparticles and redistribute their weights among others so that the physical conservation laws are in place, and that's the goal of reduction.
The point @KseniaBastrakova was trying to make was merely that in your file, the momentums were stored per macroparticle and not per physical particle. The openPMD standard has a special property for more or less each piece of particle data that says whether it's for a real- or a macro-particle, this propery is called macroweighted
. (E.g. PIConGPU outputs momentums per macro-particle.) The reduction script supports both cases or at least is supposed to do it. So the value of macroweighted
determines if the momentums and other affected data should be rescaled when a weighting changes. And of course the post-processing should take care of both options regarless of reduction. So when both the reduction and the post-processing handle macroweighted
correctly, there should be no violation of momentum conservation, etc. with any parameters of reduction.
Sorry, @KseniaBastrakova is there an issue with the reduction script or with the input data? (All picongpu momentum are macro-weighted, thus if this is the issue, we could not apply the reduction to our simulations.) Sorry, I am a bit confused by the discussion and just try to understand the current status and the remaining problem.
I am also not sure what the current status is. To reiterate my previous point, this reduction package does in principle support both macro-weighted and non macro-weighted data. So if it produces incorrect results, this is a bug in implementation and we should investigate and fix it, but it is not a principal limitation.
If I understood @KseniaBastrakova 's last point correctly, she did not see an obvious error in the last run, but I guess @alex-koe did. Maybe it is better to talk offline
Yes, I think it makes sense to talk offline. We defenetly have missunderstanding here
Okay - then let's meet - today is quite busy for me, how about tomorrow at 11 AM?
Works fine with me
For me too
Just spoke with @alex-koe offline - a meeting tomorrow at 11 AM works for him as well.
Thanks for really useful examples and scripts!
I think, i made a fix:
Here is distribution for normalized momentum u_unitSI * np.array(u) / np.array(w) / (const.m_e * const.c)
before reduction:
and after reduction (ratio of deleted particles = 0.9)
please do:
git commit && git pull
to get the fix
@KseniaBastrakova Great work! :+1: the momenta look correct. Could you please do a plt.hist2d
together with a plt.colorbar()
instead of a scatter plot? This would validate weighting as well.
@PrometheusPi here are graphs (coordinate, momentum) for each dimension before and after reduction. (for file simData_beta_60000.h5
)
@KseniaBastrakova Thanks - could please use the weights
parameter of the hist2d
plot? This would ensure that the weightings scale as required. So far, hist2d
assumes a weighting of 1 and thus only counts macro particles (causing a factor 10 difference).
@PrometheusPi , i added weights : H = ax.hist2d(z_position, z_momemtum, weights=weights_reduction, bins=50)
and got graphs. it looks preety similar
@KseniaBastrakova Thank you, that looks really good! d^2Q/dydp_y looks perfect (y \in {x,y,z}).
@alex-koe Can you please try this with your data.
@PrometheusPi Here is a comparision between original (left col.) and reducted particle sets (right col.): Sorry for not adding any axis description but it would be to small in the output. The rows are as follows: (y,uy), (x,ux), (z,uz). The date is taken from run 006. The only very minor detail i see is that the colorbar scaling (auto) is slightly different. But i guess that is more my issue in how to sample it correctly for such a small data set of N=5000 (right side). @KseniaBastrakova you did a great work.
For documention reasons, i copy the script for the particle filtering:
"""
needed Input:
+ timestep
+ filename_bunch_idendifiers
+ filename_filtered_file
"""
###### init ####
import numpy as np
import scipy as sp
import h5py
import scipy.constants as const
timestep = 60000
filename_bunch_idendifiers = "bunch-identifiers.dat"
################ WARING ################
# The filtered file will be overwritten, copy this file instead from the original data
filename_filtered_file = "simData_run006_filtered_{}.h5".format(timestep)
# read particle ids for filtering
ids = np.loadtxt(filename_bunch_idendifiers, dtype=np.uint64)
##### open h5 files
filtered_file = h5py.File(filename_filtered_file, "r+")
h = filtered_file["/data/{}/particles/en_all".format(timestep)]
current_ids = h["particleId"][()]
m = np.in1d(current_ids, ids)
if m.sum() != len(ids):
print("ERR: requested IDs are not fully contained in H5 file. Abort.")
exit
paths = ["particleId", "weighting",
"momentum/x", "momentum/y", "momentum/z",
"momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z",
"position/x", "position/y", "position/z",
"positionOffset/x", "positionOffset/y", "positionOffset/z"]
for p in paths:
temp = h[p][m]
temp_items = h[p].attrs.items()
h.__delitem__(p)
h[p] = temp
for i in temp_items:
h[p].attrs.create(name=i[0], data=i[1])
for p in ["mass", 'charge']:
temp = h[p].attrs['shape']
temp[0] = np.sum(m)
h[p].attrs['shape'] = temp
#### delete particle patches because reduction script does not process them correctly
filtered_file["/data/{}/particles/en_all/".format(timestep)].__delitem__('particlePatches')
filtered_file.close()
@KseniaBastrakova i think this issue can be closed. I just do not know how to do ...
Now, the library supports only the selection of particle species for reduction. There was a request to add functionality that will reduce the number of particles only from the selected set.
As a small feature, it can be additional script, that the script takes as an argument file with a set of particle indexes (in .dat format) with indexes of particles to which the reduction should be applied. The algorithm only applies to particles from the openPMDfile with the selected indices