aphalo / photobiology

Package ‘photobiology’ defines a system of classes for storing spectral data and accompanying methods and operators. This is the core of a suite of R packages for photobiological calculations.
4 stars 1 forks source link

Performance of `irrad()` and other summary functions #22

Open aphalo opened 7 months ago

aphalo commented 7 months ago

When I added support for objects of class generic_spct and derived classes containing multiple spectra in long form I had in mind at most a hamdful of spectra per object. Now that time series of several thousands of spectra can be easily acquired using the updated function acq_irrad_interactive() from package 'oocquire', the current implementation of these methods became a bottleneck in the data analysis. The computations themselves are simple and fast, but have a lot of overhead due to object copying, memory allocation and garbage collection. I managed to improve the performance of irrad(), e_irrad() and q_irrad() by reducing this overhead, but in principle there is room for a much larger reduction of their runtime with long time series of spectra.

The source of the problem seems to be in the extraction operators which are defined using a kludge because the tibble especialization would, at least earlier, drop attributes and modify the objects. Calling [.data.frame and [<-.data.frame directly and copying attributes seems to result in a copy of the whole object, which for objects occupying 1GB in RAM is a time-consuming task. As this is happening at each iteration inside a for loop, the larger the number of spectra, the larger the overhead per spectrum.

Anyway, the changes I have implemented in function irrad_spct() and methods irrad(), e_irrad() and q_irrad() decrease the overhead quite a lot, and make its non-linearity less pronounced. The different ratio methods, ratio(), q_ratio(), e_ratio(), etc. have to be edited to make use of the improvements in irrad_spct().

With less urgency, the summary methods for filter_spct, reflector_spct and object_spct will need a improvement in performance. The urgency is less, as acquisition of times series of these quantities is not yet implemented in 'ooacquire'. I will try to solve the case of source_spct objects first and then adapt the solution found to these other classes of objects.

aphalo commented 7 months ago

With long time series it does not make sense to compute ratios separately from irradiances, and is computionally much more efficient to compute irradiances only once, and compute the ratios from the irradiances. So, effort should go to the extraction operator so as to avoid the unnecessary copying of large objects.