Closed vsnever closed 2 years ago
Hi @vsnever these look like general emitters that would be more suited upstream in Raysect, rather than Cherab. The suggested functionality does seem a little too broad to be baked into a single object though.
Raysect and cherab emitters (and other spectral functions) automatically resample the spectra according to the ray object spectral configuration. That resampling is generally cached after the first call and purged if a ray with a different configuration is encountered. I'd rather not add an object that breaks this automatic behaviour. It should be easy to identify if whole planes of a 4D grid are zero or near zero automatically when sampling from a Spectral function (for example) or other data source. These planes can then be dropped from the cache and a zero returned when they are requested.
Cherab already has (albeit poorly implemented) spatial caching objects for functions, which I'm in the (slow) process of replacing. If these were extended to SpectralFunctions in Raysect and a custom Integrator object was added to the InhomogeneousVolumeEmitter, that understood the cache, it would achieve your speedup but would be significantly cleaner and easier to expand later.
On reflecting some more, this is definitely a Raysect extension, not a Cherab one. I can't see anything specific to Cherab here e.g. needing plasma object etc...
Hi @CnlPepper, thanks for your reply. I agree, the emitters are pretty general indeed. However, they were developed to process the typical output of the plasma codes like SOLPS, ERO-2 or EMC3-Eirene as fast s possible.
I saw the Caching objects from Cherab. They can be used, e.g., to interpolate the original data defined on the arbitrary mesh to the regular mesh, but the integrators for regular grids are still needed. Also, these caching objects perform cubic interpolation only. I tried using cubic splines to interpolate noisy Monte-Carlo data of SOLPS on a regular grid (ERO-2 produces the data on regular grids by default) and got a lot of negative values here and there, so I switched to linear interpolation.
I'm totally ok, if the proposed emitters and integrators will be only a temporary solution and in the future will be replaced with something more general and robust. But currently implementing a SpectralFunction that creates caches on some regular grids on the fly seems to complex for me. You can see the current version of the code in these two files: regular_grid_volumes.py, regular_grid_emitters.pyx. Also, I made a version of volume.py demo from Raysect for RegularGridBox object: 1_regular_grid_box.py. If this code is more suitable for Raysect, I'll make it a Raysect extension.
Also, I agree that resampling the spectra if ray's delta_wavelength
is different from that of the emission array is a correct behavior. However, simple interpolation will not work. If the spectrum is delta-functional, e.g. the material emits only at certain wavelengths and spectral line broadening is neglected, it should remain delta functional after resampling. Also, resampling must not change the integral over wavelength.
Looking though your code, this should definitely be in raysect, rather than cherab. The code looks good, though I think I would make a few modifications to generalise it further. If there are any specific routines for handling the data from modelling codes, those should remain in cherab, they would not be general enough for raysect.
Regarding resampling, if you have a closer look at the spectral function resampling code, you'll see it integrates and averages correctly. So provided you define real emission lines, with real widths, the resampling will correctly conserve the integral. All the cherab core models also work this way.
Now I see the code, ignore the cherab caching discussion, it wasn't applicable.
Regarding resampling, if you have a closer look at the spectral function resampling code, you'll see it integrates and averages correctly. So provided you define real emission lines, with real widths, the resampling will correctly conserve the integral. All the cherab core models also work this way.
Yes, I was looking at SpectralFunction
code after you raised the issue with resampling. Basically, the emission must be provided along with wavelength array just like samples
in InterpolatedSF
spectral function. The only difference is that the emission is a 2D array (assuming that conversion from 4D to 2D is already done). So, I'll need to create something like SpectralFunction2D
and InterpolatedSF2D
. Also, even if the cache will be stored in a smart way (only non-zero slices), memory consumption will be twice as large, because both the original emission and the cache need to be stored.
Anyway, when I'll do a correct resampling, I'll resubmit this code to Raysect.
There shouldn't be any need to store the data in a 2D spectral function object. You can use a 1D spectral function object at each location in space i.e. hold a 2D or 3D array of those objects. Then resample them into the required dimensions only when called. It'll use memory, but memory is cheap. For now it would be better to get the functionality working cleanly before optimising.
Raw data array -> [ __init__()
] -> Array of InterpolatedSF objects -> [first ray evaluation] -> Sampled data cache.
A brief aside: Looking through your code I noticed the use of libc malloc, generally this should be avoided. If you must dynamically allocate memory, use the python memory C api allocation methods so it can better optimise the memory used by the interpreter. That said, it would be better to use numpy or lists for dynamic memory allocation, there should be no need to go that low level. I think I've only ever needed to use dynamic memory allocation if I need arrays of C structs back a few cython versions. The performance hit of a numpy array accessed via a memory view is zero as it compiles to raw pointers.
How large are these data arrays? If we are talking multi-gigabyte then there may be some issues. Anything smaller, I wouldn't be concerned. We can re-add a way to manually populate the cache with a fixed wavelength response later if it is a major memory issue. something like:
# normal case (wavelengths shape is (K, ), data shape is (N,M,K))
CartesianGriddedEmitter(wavelengths, data)
# special case (data shape is (N, M, S) where S is the number of spectral samples)
CartesianGriddedEmitter(
presampled_data=[...],
presampled_min_wavelength=x,
presampeld_max_wavelength=y,
)
Note that K != S. In normal operation S would be specified by the ray and the K values would be resampled. In the special case any ray that didn't precisely match the presampled data specification would cause an error to be thrown. So, by default the object is transparent and easy to use, the hassles caused by optimisation are hidden unless explicitly needed.
How large are these data arrays? If we are talking multi-gigabyte then there may be some issues.
The resolution of spatial grid in the case of ERO-2 simulation for 20-deg ITER sector is 500x200x1000 points (R, phi, Z). About 80% of the cells usually have nothing but zeros, but still the data arrays are large. With high enough spectral resolution we can easily get a multi-gigabyte data array.
There shouldn't be any need to store the data in a 2D spectral function object. You can use a 1D spectral function object at each location in space i.e. hold a 2D or 3D array of those objects.
I tried to create a list of 20,000,000 (20% of 500x200x1000 grid) InterpolatedSF objects with just 10 spectral points each (with empty cache). Terminated the process after it consumed ~24 GB of RAM. So, not an option.
I see.... large! The raw data is ~4GB for 500x200x1000x10 floats.
Ok I think I'd be happy to accept an emitter that only works for a strictly defined spectral representation and raises an exception if the data is not in that form. We can extend it later to add live (uncached) resampling at a CPU cost or cached resampling at a memory cost for applications with less data. This will result in an API change at some point though.
It sounds like your data should be in a sparse matrix object rather than raw arrays. It might be wise to place an interface between the data buffer and the emitter class to allow different data representations.
class Indexable3D:
def index(self ix, iy, iz):
raise NotImplementedError
# implements direct indexing to an internal array
class Array3D(Indexable3D)
# implements indirect indexing into a sparse array
class SparseArray3D(Indexable3D)
Of course this is looking like reimplementing memoryviews/numpy....! Do the scipy sparse matrices implement the buffer interface? If so that would present an easier solution: https://docs.scipy.org/doc/scipy/reference/sparse.html#
Perhaps just adding support for sparse matrices would be a better idea than a custom caching/buffer solution:
https://stackoverflow.com/questions/25430866/scipy-sparse-matrices-and-cython
Perhaps just adding support for sparse matrices would be a better idea than a custom caching/buffer solution: https://stackoverflow.com/questions/25430866/scipy-sparse-matrices-and-cython
Thanks! Potentially, using sparse matrices can be even faster and consume less memory. The code will also be simpler. I think that lil_matrix
or csr_matrix
will work (rows are grid cells, columns are spectral bins). My only concern is the speed of indexing and memory access compared to memoryview
. I'll check this.
I too think using sparse matrices is the way to go here. You'll solve your memory issues, and I suspect that a good sparse matrix implementation will also help with cache coherency (you have less data, so more of it will fit in the cache). A properly typed implementation of sparse matrix indexing shouldn't involve significantly more work than an N-D memoryview either, providing the branch prediction for zero vs non-zero element works well.
I hope I addressed all the issue raised here. @CnlPepper suggested that the code should be in Raysect, so I created a draft pull request in Raysect. The discussion can be continued there.
I'll close this issue: further discussion should occur in the linked raysect issue.
I would like to add to Cherab the emitters and integrators that work with regular grids (Cartesian or cylindrical) in similar way to recently added Ray Transfer Objects, but integrate actual spectra unlike Ray Transfer Objects.
These special emitters and integrators could be used with the observers with high spatial but low spectral resolution like filtered or unfiltered 2D cameras. Of course, the original data generated on a non-regular grid (with SOLPS or EMC3-Eirene) must be interpolated on a regular grid first. However, if the memory allows, the resolution of the regular grid can be very high, so the accuracy of the original mesh can be preserved. These emitters and integrators allow to achieve tenfold speedups compared to the general
NumericalIntegrator
and emission models used in thePlasma
object.Here is the image simulated using these emitters and integrators for the one of the ITER 55.G1 cameras. The image parameters are 642x803 pixels, 5000 samples per pixel. The data include the emission of 14 spectral lines of D, He, Be and Ne in visible range calculated with SOLPS (by A.S. Kukushkin) and then interpolated on 1cm x 1cm regular grid for this simulation. The volume integration step is 2 mm. The calculation took only 8 hours on Core i9-9920x (note that just tracing this amount of rays though the scene without plasma would take almost 3 hours). Please note that I still do not have the actual 3D model of the ITER first wall, so I used the model for a single 20-deg sector, which contains many approximations.
I already wrote the Cython code for these emitters and integrators and now only need to add documentation, unit tests and demos.
Here is the interface of the
RegularGridCylinder
object:RegularGridCylinder(emission, spectral_index, min_wavelength, radius_outer, height, radius_inner=0, period=360., step=None, parent=None, transform=None)
emission
is the 4D array containing the spectral emission defined on a regular 3D grid (in axisymmetric caseemission.shape[1] == 1
). The last dimension of this array is the spectral one. While thedelta_wavelength
used to calculate the spectral emission must be equal to(camera.max_wavelength - camera.min_wavelength) / camera.spectral_bins
, some of the spectral bins can be skipped (if they are empty), thusemission.shape[3]
can be lower thancamera.spectral_bins
. This allows to save memory, which is especially useful in a non-axisymmetric case. In the example above thecamera.spectral_bins
was equal to 98 while theemission.shape[3]
was equal to 14.spectral_index
is the 1D array withspectral_index.size == emission.shape[3]
, which maps the spectral emission array to the respective spectral bins.min_wavelength
is the minimal wavelength equal tocamera.min_wavelength
. This parameter is required to correctly process dispersive rendering.All other parameters are the same as for the
RayTransferCylinder
.I hope the code will be ready for review in a few days. Your comments would be greatly appreciated.