Open namurphy opened 1 year ago
Particle
. We'd probably need to check on this.Wait a minute. IIRC I thought the entire JSON is loaded into memory at (sub)package import time. Then you'd gain precisely nothing by changing the storage mechanism.
Moreover, h5py
is... unwieldy at best, and a heavy dependency.
This is not something we should do unless and until we actually profile the runtime of a particle initialization and prove that loading the data is an important factor in its speed that actually needs optimizing.
If we really wanted to speed up particle
s... I don't know, I briefly thought about sticking them into a dataclass
(#1110 is the most related, I suppose), but then again astropy.units
will be slowing it down as well.
I do remember trying out imports of different subpackages a while back, and importing plasmapy.particles
had been the slowest (I think maybe around the time of #1630). My main hypotheses about the causes were the time it takes to read in the JSON files, and/or the time it takes to automatically instantiate a few particles (like plasmapy.particles.proton
and plasmapy.particles.electron
).
What I'm wondering about is that...the vast majority of elemental and isotopic data will not get used in a typical application, so in principle, lazy loading of data would be really helpful. However, I agree with you that profiling would be necessary for us to make good decisions.
I just started playing with...
$ python -X importtime -c "import plasmapy" 2> import_plasmapy.log
$ pip install tuna
$ tuna import_plasmapy.log
...using tuna to visualize it.
Also...maybe HDF5 would not be the best binary format to consider. There are other alternatives with better performance.
Feature description
Currently, our atomic data files are stored in
plasmapy/particles/data
inelements.json
andisotopes.json
. This issue proposes to change them to HDF5 format.Motivation
JSON is human readable but slow to access, while HDF5 is less human readable but fast to access. Elemental and isotopic is accessed really frequently (in particular by
@particle_input
), so performance is more important than readability for this occasion. Additionally, when we update the data sets (#1914), we'll probably end up rebuilding the files entirely, so readability of the data itself is not as important.JSON is also better to track changes via version control, but we haven't updated the files in 2.5 years, so that's less of an issue too.
Implementation strategy
We could probably do this at the same time that we address #1914.
Additional context
See also #591.