Reducing partiticles from set of ID's

KseniaBastrakova commented 3 years ago

Now, the library supports only the selection of particle species for reduction. There was a request to add functionality that will reduce the number of particles only from the selected set.

As a small feature, it can be additional script, that the script takes as an argument file with a set of particle indexes (in .dat format) with indexes of particles to which the reduction should be applied. The algorithm only applies to particles from the openPMDfile with the selected indices

alex-koe commented 3 years ago

After offline disscussion with @KseniaBastrakova @prometeusPi, i using this approach: 1) copy original file from PIConGPU as it worked in #22 2) open the copied (!) file RW, i.e., with python ('r+') 3) Filtering only the wanted particles, e.g., bunch particles in the FWHM of energy

This means, for the datasets {"particleId", "weighting", "momentum/x", "momentum/y", "momentum/z", "momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z", "position/x", "position/y", "position/z", "positionOffset/x", "positionOffset/y", "positionOffset/z"} delete all particles except for the wanted particles
make sure that the attributes of above datasets are preserved 4) Update the shape attribute for mass and charge to the new particle number 5) close file and make sure that data is sucessfully written

Now i submitted a jobs as mentioned in #22 . It failed with:

Series.set_software_version is deprecated. Set the version with the second argument of Series.set_software
I == 0
I == 1
I == 2
I == 3
I == 4
I == 5
I == 6
I == 7
I == 8
I == 9
I == 10
I == 11
I == 12
I == 13
I == 14
I == 15
I == 16
I == 17
I == 18
I == 19
I == 20
I == 21
I == 22
I == 23
I == 24
I == 25
I == 26
I == 27
I == 28
I == 29
I == 30
I == 31
I == 32
I == 33
I == 34
I == 35
I == 36
I == 37
I == 38
I == 39
I == 40
I == 41
I == 42
I == 43
I == 44
I == 45
I == 46
Traceback (most recent call last):
  File "reduction_main.py", line 822, in <module>
    base_reduction_function(args.hdf, args.hdf_re, "random", parameters, args.iteration)
  File "reduction_main.py", line 220, in base_reduction_function
    reduce_one_iteration(series_hdf, series_hdf_reduction, algorithm, iteration)
  File "reduction_main.py", line 206, in reduce_one_iteration
    process_iteration_group(algorithm, current_iteration, series_hdf, series_hdf_reduction, reduction_iteration)
  File "reduction_main.py", line 240, in process_iteration_group
    process_patches_in_group_v2(iteration.particles[name_group], series_hdf,
  File "reduction_main.py", line 686, in process_patches_in_group_v2
    data, dict_data_indexes, last_idx = get_data(series_hdf, particle_species, weights, idx_start, idx_end)
  File "reduction_main.py", line 642, in get_data
    if len(absolute_values[0]) == 1:
IndexError: index 0 is out of bounds for axis 0 with size 0

alex-koe commented 3 years ago

I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?

KseniaBastrakova commented 3 years ago

I guess i should take into account to update some other attributes / dataset as well, when changing the particles. Could it be some patches?

yes, I think, you missed "unit_si" or "macroweighted" attribute Could you send me a link to the file? It will be helpful to reproduce

alex-koe commented 3 years ago

This is the path to the file on taurus /scratch/ws/1/s6119265-simdata-exchange where you will find the file simData_filtered_60000.h5. This is the file i got the error with. I dont know if this is the good file exchange but i hope it works. Do you have a taurus account?

sbastrakov commented 3 years ago

Thanks @alex-koe . Unfortunately neither @KseniaBastrakova nor I have an account there. So could you transfer to Hemera please?

alex-koe commented 3 years ago

@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/ on hemera. It should be readable. Please copy the file for your usage.

KseniaBastrakova commented 3 years ago

@sbastrakov the file is now on /bigdata/hplsim/production/LWFA_betatron_decoherence/ on hemera. It should be readable. Please copy the file for your usage.

Thank you for providing the file

I think, i understand your problem: in filesimData_filtered_60000.h5 the sizes of all datasets:momentum, charge, position, etc ..is 1705615. you can check it, for example, by:

    f = h5py.File('simData_filtered_60000.h5', 'r')
    result = f["data"]["60000"]["particles"]["en_all"]["particleId"]
    print(result.shape)

and output is :(1705615,). And your dataset numParticlesOffset in particle patches doesn't correspond values here from 0 to 1.30344E7. (that much bigger)

    f = h5py.File('simData_filtered_60000.h5', 'r')
    result = f["data"]["60000"]["particles"]["en_all"]["particlePatches"]['numParticlesOffset'][0:193]
    print(result)

and output is:

[       0    24357    29493    33502    56927    82990   104215   119799
   ...
 13029684 13029684 13029684 13029684 13029684 13032670 13033621 13034400]

numParticlesOffset` has to be in range [0: particle_species_size]. It indicates start of new particles chunk.

So, i can suggest two solutions: 1) i can fix the file, by changing 'numParticlesOffset' from origin to version that has values only from 0 to 1705615 2) i cah add additional parameter "skip particle patch and make new" in redutor

sbastrakov commented 3 years ago

So what may have gone wrong is that when removing the particles except the IDs you wanted to keep, the patch information was not updated correspondingly @alex-koe . Not 100% sure, just seems plausible. I think the easiest is to just remove the patches datasets alltogether then? (They are optional, and the reductor works when they are not present, just does not work when they are present but wrong)

alex-koe commented 3 years ago

@sbastrakov I agree, the patch information was not updated. Just removing the patches is the easiest approach. I deleted the patches datasets from the file and restarted the job. The result was a segmention fault. Does the script need to be adjusted?

sbastrakov commented 3 years ago

As i thought no, i think @KseniaBastrakova will take a look

KseniaBastrakova commented 3 years ago

@alex-koe i think, i reproduce your problem and fix it. Could you make git fetch && git pull and try reduction once again?

alex-koe commented 3 years ago

@KseniaBastrakova It is working! Great work. And it is fast: it took only about 2min to run the job. :-)

alex-koe commented 3 years ago

@sbastrakov @KseniaBastrakova I have only a minor question. When i do read out the position dataset, it looks the same for original and reduced particles and he data set is reduced in size. The weights seems to be ok, i.e., the sum of the weights give the same charge. The average position and spread in position is consistent between original and reduced set.

However, i do not understand the momentum: This is the original set in (y, p_y):

This is the reduced set: The distribution in y looks fine. The distribution in p_y seems to have a different factor.

For reading the momentum, i do the following:

""" label is one of {x,y,z}, species is *en*, ..."""
    u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)]  # ~ gamma beta m c
    w = f['/data/{}/particles/{}/weighting'.format(timestep, species)]  # 
    u_unitSI = u.attrs["unitSI"]
    uy =  u_unitSI*np.array(u) /np.array(w)  /(const.m_e*const.c)   ## the normalized momenum gamma beta_i

Do i have to take special care here to get the correct momentum?

sbastrakov commented 3 years ago

Thanks for reporting. It could be that somewhere (not sure on whose side) the macroweighted flag is treated incorrectly.

sbastrakov commented 3 years ago

To extend on that thought: @alex-koe your way of computing momentum is not suitable for general openPMD files, as the momentum attribute could theoretically be for both single- and macroparticle, and there is a special attribute that informs on what way was used. However, I assume you know how it is for your particular file (that momentum is for macroparticles, like in PIConGPU output) and read it accordingly. So I think it is more likely there is something fishy in the reductor @KseniaBastrakova , e.g. maybe the reductor writes it out with the different macroweighted way?

alex-koe commented 3 years ago

@sbastrakov so what would be then the correct way? I used pretty used the same way that did give me the correct momentum for all simulations as before and never had an issue. Is the 'reduced' file somehow in a different format?

alex-koe commented 3 years ago

This is another thing i tried out, the x-ux space with the random reductor: grafik These are about 5001 particles illustrated by a scatter plot.

KseniaBastrakova commented 3 years ago

@alex-koe thank you for details. I think, the problem on my side: i found a bug, that can couse this behavior. I will fix it today.

KseniaBastrakova commented 3 years ago

@alex-koe , now it's work better. Could you make git fetch && git pull and try once again?

alex-koe commented 3 years ago

@KseniaBastrakova I tried it again. It is now different :-D grafik grafik The momentum and (x-ux) looks strange.

KseniaBastrakova commented 3 years ago

@alex-koe . So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce I just tried to build two graphs: before and after redaction by this:


def print_graph():
    f = h5py.File(''simData_beta_60000.h5', 'r')
    timestep = 60000
    species = 'en_all'
    m_e = 9.1e-31
    c = 1080000000
    label = "y"
    u = f['/data/{}/particles/{}/momentum/{}'.format(timestep, species, label)]  # ~ gamma beta m c
    w = f['/data/{}/particles/{}/weighting'.format(timestep, species)]  #
    u_unitSI = u.attrs["unitSI"]
    uy = u_unitSI * np.array(u) / np.array(w) / (m_e * c)
    plt.scatter(u, uy, marker='x')
    plt.xlabel('y [um]')
    plt.ylabel("p_y[MeV/e]")
    plt.show()

first, simData_beta_60000 before reduction after_simData_beta_60000

and after reduction with ratio of deleted particles = 0.1 after_simData_beta_60000

So, graphs look similar, so i can't reproduce problem

alex-koe commented 3 years ago

So, how dou you bild graph and which simulation do you use? Maybe, it will help me, case i cant reproduce I just tried to build two graphs: before and after redaction by this:

I put the submit script, the jupyter notebook i used for the plots shown and the sim file for in and out to the tar archive for-Ksenia.tar.bz2 on /bigdata/hplsim/production/LWFA_betatron_decoherence/ (on hemera). So you can follow what i did. A major difference to your data is actually, that i used a much larger ratio of deleted particles close to one. I dont know if this could causing something.

sbastrakov commented 3 years ago

I don't think a large ratio should not work. It would certainly make the sampling much worse, but if that's acceptable otherwise should not be an issue.

KseniaBastrakova commented 3 years ago

@alex-koe , could you give me access to thefor-Ksenia.tar.bz2? I just don't have rights to copy it

alex-koe commented 3 years ago

@alex-koe , could you give me access to thefor-Ksenia.tar.bz2? I just don't have rights to copy it

@KseniaBastrakova you have now right access to the file.

KseniaBastrakova commented 3 years ago

@alex-koe, Thank you for scripts and sample. It's really useful

The problem that momentums are "macroweighted" that means, that we store total momentum for all electrons in microparticle. (weigh * one_marticle_momentum)

So, if you delete 0.9, so the start number of macroparticles is 13037395 rest number of macroparticles is 1303739, so we deleted11733655macroparticles. And now, we need to redristribute their weight between rest macropartilces, that means we add about1300 weight to each rest macroparticle. And, becouse we store momemum asmomentum * weight_of_macroparticle we get such big momentums

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

sbastrakov commented 3 years ago

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

KseniaBastrakova commented 3 years ago

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

I can make additional script to fix it, as a temporary solution

alex-koe commented 3 years ago

So, i have suggision for you: if you expect, that all macroparicles are not macroweghted, i can add flag "-ignore macroweighted" flag to reductor

I believe this should be not done as part of the reduction. As then it's really easy to use it wrongly.

I can make additional script to fix it, as a temporary solution

Sorry, i maybe not correctly understand: It is not possible to redistribute the weighting after the reduction process? I would like to have the total physical quantities, ie. energy, momentum, charge, mass, conserved. Such as, that if i have in the original file 100pC of charge and the average momentum is 100MeV/c, then the reduced file has the same charge and average momentum but just the number of macro particle is reduced.

sbastrakov commented 3 years ago

@alex-koe let me reiterate. Of couse, it is possible to remove some macroparticles and redistribute their weights among others so that the physical conservation laws are in place, and that's the goal of reduction.

The point @KseniaBastrakova was trying to make was merely that in your file, the momentums were stored per macroparticle and not per physical particle. The openPMD standard has a special property for more or less each piece of particle data that says whether it's for a real- or a macro-particle, this propery is called macroweighted. (E.g. PIConGPU outputs momentums per macro-particle.) The reduction script supports both cases or at least is supposed to do it. So the value of macroweighted determines if the momentums and other affected data should be rescaled when a weighting changes. And of course the post-processing should take care of both options regarless of reduction. So when both the reduction and the post-processing handle macroweighted correctly, there should be no violation of momentum conservation, etc. with any parameters of reduction.

PrometheusPi commented 3 years ago

Sorry, @KseniaBastrakova is there an issue with the reduction script or with the input data? (All picongpu momentum are macro-weighted, thus if this is the issue, we could not apply the reduction to our simulations.) Sorry, I am a bit confused by the discussion and just try to understand the current status and the remaining problem.

sbastrakov commented 3 years ago

I am also not sure what the current status is. To reiterate my previous point, this reduction package does in principle support both macro-weighted and non macro-weighted data. So if it produces incorrect results, this is a bug in implementation and we should investigate and fix it, but it is not a principal limitation.

If I understood @KseniaBastrakova 's last point correctly, she did not see an obvious error in the last run, but I guess @alex-koe did. Maybe it is better to talk offline

KseniaBastrakova commented 3 years ago

Yes, I think it makes sense to talk offline. We defenetly have missunderstanding here

PrometheusPi commented 3 years ago

Okay - then let's meet - today is quite busy for me, how about tomorrow at 11 AM?

sbastrakov commented 3 years ago

Works fine with me

KseniaBastrakova commented 3 years ago

For me too

PrometheusPi commented 3 years ago

Just spoke with @alex-koe offline - a meeting tomorrow at 11 AM works for him as well.

KseniaBastrakova commented 3 years ago

Thanks for really useful examples and scripts! I think, i made a fix: Here is distribution for normalized momentum u_unitSI * np.array(u) / np.array(w) / (const.m_e * const.c) before reduction: XY_before_reduction YZ_before_reduction

and after reduction (ratio of deleted particles = 0.9) XY_after_reduction YZ_after_reduction

please do: git commit && git pull to get the fix

PrometheusPi commented 3 years ago

@KseniaBastrakova Great work! :+1: the momenta look correct. Could you please do a plt.hist2d together with a plt.colorbar() instead of a scatter plot? This would validate weighting as well.

KseniaBastrakova commented 3 years ago

@PrometheusPi here are graphs (coordinate, momentum) for each dimension before and after reduction. (for file simData_beta_60000.h5)

Xp_x_before_reduction

Xp_x_after_reduction

Yp_y_before_reduction

Yp_y_after_reduction

Zp_z_before_reduction

Zp_z_after_reduction

PrometheusPi commented 3 years ago

@KseniaBastrakova Thanks - could please use the weights parameter of the hist2d plot? This would ensure that the weightings scale as required. So far, hist2d assumes a weighting of 1 and thus only counts macro particles (causing a factor 10 difference).

KseniaBastrakova commented 3 years ago

@PrometheusPi , i added weights : H = ax.hist2d(z_position, z_momemtum, weights=weights_reduction, bins=50) and got graphs. it looks preety similar

Xp_x_before_reduction

Xp_x_after_reduction

Yp_y_before_reduction

Yp_y_after_reduction

Zp_z_before_reduction

Zp_z_after_reduction

PrometheusPi commented 3 years ago

@KseniaBastrakova Thank you, that looks really good! d^2Q/dydp_y looks perfect (y \in {x,y,z}).

PrometheusPi commented 3 years ago

@alex-koe Can you please try this with your data.

alex-koe commented 3 years ago

@PrometheusPi Here is a comparision between original (left col.) and reducted particle sets (right col.): grafik Sorry for not adding any axis description but it would be to small in the output. The rows are as follows: (y,uy), (x,ux), (z,uz). The date is taken from run 006. The only very minor detail i see is that the colorbar scaling (auto) is slightly different. But i guess that is more my issue in how to sample it correctly for such a small data set of N=5000 (right side). @KseniaBastrakova you did a great work.

alex-koe commented 3 years ago

For documention reasons, i copy the script for the particle filtering:


""" 
needed Input:
+ timestep
+ filename_bunch_idendifiers
+ filename_filtered_file
""" 
###### init ####
import numpy as np
import scipy as sp
import h5py
import scipy.constants as const

timestep                              = 60000
filename_bunch_idendifiers = "bunch-identifiers.dat"
################ WARING ################
# The filtered file will be overwritten, copy this file instead from the original data
filename_filtered_file           = "simData_run006_filtered_{}.h5".format(timestep)

# read particle ids for filtering
ids = np.loadtxt(filename_bunch_idendifiers, dtype=np.uint64)

##### open h5 files 
filtered_file = h5py.File(filename_filtered_file, "r+")

h = filtered_file["/data/{}/particles/en_all".format(timestep)]
current_ids = h["particleId"][()]
m   = np.in1d(current_ids, ids)
if m.sum() != len(ids):
    print("ERR: requested IDs are not fully contained in H5 file. Abort.")
    exit

paths = ["particleId", "weighting",
             "momentum/x", "momentum/y", "momentum/z",
             "momentumPrev1/x", "momentumPrev1/y", "momentumPrev1/z",
             "position/x", "position/y", "position/z",
             "positionOffset/x", "positionOffset/y", "positionOffset/z"]
for p in paths:
    temp = h[p][m]
    temp_items = h[p].attrs.items()
    h.__delitem__(p)
    h[p]  = temp
    for i in temp_items:
        h[p].attrs.create(name=i[0], data=i[1])
for p in ["mass", 'charge']:
    temp = h[p].attrs['shape']
    temp[0] = np.sum(m)
    h[p].attrs['shape'] = temp

#### delete particle patches because reduction script does not process them correctly
filtered_file["/data/{}/particles/en_all/".format(timestep)].__delitem__('particlePatches')
filtered_file.close()

alex-koe commented 3 years ago

@KseniaBastrakova i think this issue can be closed. I just do not know how to do ...

ComputationalRadiationPhysics / particle_reduction

Reducing partiticles from set of ID's #21