Open ax3l opened 1 year ago
Hey @ax3l - I have to admit that I was embarrassingly not actually aware of openPMD! It looks great.
It is fairly minimal amount of work to add new Loaders/Exporters (depending of course on how complex the data source is). I would be happy to take a look at loading openPMD data. I don't suppose you already have some files handy I could test on? Also, I notice that openPMD supports multiple data formats. It might be quite some work to write a DataLoader that handled several formats, but as a proof of principle would it be acceptable to just demonstrate on one format?
Hi @bwheelz36 , sorry for the edit in my original message.
I added a few example files and a probably four liner to load data via an edit :)
import openpmd_api as io
s = io.Series("../samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"] # 400 or another "step" in the data series
df = electrons.to_df() # careful: all SI at this point
After finishing the docs, I would also be excited to attempt an exporter :star_struck:
(Please do not feel that my implementation questions as required for the JOSS review to pass. I am just truly curious and the other comments in between for the manuscript are more important to add please :) )
Hi @ax3l
That's all good - given there is a defined open dataset format, it absolutely makes sense that this package should support it.
Having said that - I'm a bit confused tbh. I'm trying to run the first read example from the openpmd-api site with the following code:
import openpmd_api as io
series = io.Series( "data%T.h5", io.Access.read_only)
I pointed this code to each of the three examples example-2d',
example-3d', example-thetaMode
- (it is actually not that clear from the example that this is what you are supposed to do?). In each case the data loads, but there is no information in the 'iterations' attribute?
Hi @bwheelz36,
Thanks for trying the example datasets! The iterations concept is explained here: https://openpmd-api.readthedocs.io/en/latest/usage/concepts.html
there is no information in the 'iterations' attribute?
Please let me know if you have more questions on this in case I missed the point of the question :)
Once you open a data Series
, you can loop over available iterations in it, read the data in each iteration, etc
Hi @ax3l
Ok, here's an end to end example of what I tried. Maybe I'm doing something extremely stupid...
in a terminal:
# inside a fresh virtual environment
git clone https://github.com/openPMD/openPMD-example-datasets.git
cd openPMD-example-datasets
tar -zxvf example-2d.tar.gz
tar -zxvf example-3d.tar.gz
tar -zxvf example-thetaMode.tar.gz
pip install openpmd-api
python # enter python session
inside python:
import openpmd_api as io
data_loc = "example-2d/hdf5/data%T.h5"
s = io.Series(data_loc, io.Access.read_only)
Here's the explorer view of s; it appears to simply have nothing in it?
Oh that is wild, thanks for reporting! We check against most of those files in CI, but maybe something slipped in that we did not cover :-o
I will double check this after my conferences and summer break.
For this, see my comment here:
The string representations of many classes are counterintuitive and have led to confusion, e.g.
series.iterations
printed will look as if it is empty
I guess that this issue is proved again.. The data is there, it just does not look like it:
>>> import openpmd_api as io
>>> s = io.Series("data%T.h5", io.Access.read_only)
>>> s.iterations
<openPMD.Attributable with '0' attributes>
>>> [index for index in s.iterations]
[255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400]
Thank you for updating the representation strings, @franzpoeschel! This will be shipped with the next patch release, 0.15.2
.
@bwheelz36 for your example above, all looks good and you can keep exploring what is inside the data series s
like this:
for k_i, i in s.iterations.items():
print("Iteration: {0}".format(k_i))
for k_p, p in i.particles.items():
print(" Particle species '{0}':".format(k_p))
inside the particle species p
is then a record component that is a key-value pair of a string + record component, which can be accessed like a numpy array, e.g., u_x = p["momentum"]["x"][()]
- note that s.flush()
will fill the array u_x
with actual data.
Even easier is the access as a data frame, as in the 11_particle_dataframe.py example:
for i in s.iterations:
for p in i.particles:
df = p.to_df()
print(df)
@bwheelz36 did this help? :)
Hi @ax3l - the first loop you posted above helps yes - it is clear there is some data there! in that example, doing p.to_df()
gives a dataframe which would facilitate close to one-to-one read in to ParticlePhaseSpace.
the second loop crashes with AttributeError: 'int' object has no attribute 'particles'
. I added a line if hasattr(i, 'particles'):
however this was never entered...
Can I make sure I understand the intent behind iterations - each iteration would represent for instance a time interval?
the second loop crashes with
AttributeError: 'int' object has no attribute 'particles'
. I added a lineif hasattr(i, 'particles'):
however this was never entered...
I think that there is a slight bug in the second loop, try this one:
for it_index, it in s.iterations.items():
for p in it.particles:
df = p.to_df()
print(df)
Thank you for the JOSS submission in https://github.com/openjournals/joss-reviews/issues/5375 .
I really like the support of the IAEA data loaders.
Based on the extended abstract and linked motivating discussion in it, I was wondering: I am personally curious if, for phase space data, the openPMD standard [1] [2] (disclaimer: I lead this effort) could be helpful as an additional input loader source? We have by now a relatively large selection of accelerator codes supporting openPMD as their output and also try to use it more in experimental laser-plasma accelerator work.
The paper summarizes so far:
If one were to implement another loader, how much work would be needed? I am looking at https://bwheelz36.github.io/ParticlePhaseSpace/new_data_loader.html
and am further curious about data sizes: #158
Update: I found https://bwheelz36.github.io/ParticlePhaseSpace/code_docs.html#ParticlePhaseSpace.DataLoaders.Load_PandasData which might be pretty easy to couple to openPMD with https://github.com/openPMD/openPMD-api/blob/0.15.1/examples/11_particle_dataframe.py (example data sets here). (Our Pandas reader supports chunked processing - let's continue discussion on lasy loading/streaming/out-of-core processing in #158)
Note that the linked reference
Kuschel, S. (2022). Postpic.
https://github.com/skuschel/postpic implemented openPMD early on. Minor correction: I think it should read(2014)
as of the first release for this reference.[1] https://github.com/openPMD [2] https://www.openPMD.org