Open martin-rdz opened 4 years ago
Hi Martin, thanks for your efforts and suggestions.
I knew the data visualization speed would be a hidden issue quite some time ago, as the plots are massive compared with many other projects. The motivation for choosing matplotlib is also the much higher efficiency of data visualization. Therefore, I'm quite eager to try your suggestions to improvement the efficiency, taking into account the expansion of PollyNET in the upcoming years.
With regard to your test results for each single plot, it seems to be much longer than I expected. I would suggest to check whether there were multiple tasks running in the backend to comsume the CPU resources.
As I checked the recent log files in rsd1, it still takes less than 1s (144 plots within 2 min). If so, I wonder whether there are still space for improvement...
Start time of data visualization
Stop time of data visualization
I guess the hardware of rsd2 should be better than the rsd1 (Hope so... π). Therefore, under normal situation, it should be in capable of processing 10 pollys in parallel in the new server. So..., maybe we can still relax for another two years...
But it's very interesting to discusse the data visualization and I will leave this issue open. Any comments are welcomed!!!
This is from yesterdays test run on rsd2, with 4 out of 8 cpus being idle. Processor frequency is same as for rsd_old, but number of cores increased.
[2020-10-20 17:54:13] Start to visualize results.
...
[2020-10-20 17:58:12] Finish.
It's 4 instead of 5 mins, because i tried the subsampling for the saturation plots.
In my option problem is not the operational processing, but reprocessing old datasets (over and over again, when the algorithm improves). I guess there is still room for improvement, though it might be tricky to implement. A framerate of 1.2fps should not be the technical limit ;)
Let's keep the discussion open.
I just read a German forum entry, in which it was stated that the time format used is very important. I have no clue yet, in which format the currently the times are handed over to matplotlib for the time-height-plots, but maybe it is worth investigating this.
Matlab Parallel Processing could be a feasible way for speeding up data visualization. It can reduce the time usage by several times, depending on how many figures to be ploted. (see the test script below)
a = 1:100;
b = sin(a/100 * pi);
% single processing
startTime1 = now;
for i=1:100
figure('visible', 'off');
plot(a, b); hold on;
plot(a, b); hold on;
plot(a, b); hold on;
plot(a, b); hold on;
plot(a, b); hold on;
end
stopTime1 = now;
% parallel processing
startTime2 = now;
parfor (i=1:100, 10)
figure('visible', 'off');
plot(a, b); hold on;
plot(a, b); hold on;
plot(a, b); hold on;
plot(a, b); hold on;
plot(a, b); hold on;
end
stopTime2 = now;
fprintf('Time usages: %f vs %f\n', stopTime1 - startTime1, stopTime2 - startTime2);
But it throws error of 'out of memory' when I implemented parallel processing for Picasso. This was caused by the multiple instances of data
kept within the parallel workspace. Anyway, this error can be resolved by some coding tricks.
So let's keep this in mind.
Hey @ulysses78 , could you take care of this issue whenever you have time for it? Currently the plotting is the most time consuming process in the chain.
@martin-rdz suggestest already some solutions:
1: https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.pcolorfast.html
2: https://docs.rs/matfile/latest/matfile/ docs.rs (https://docs.rs/matfile/latest/matfile/) matfile - Rust Matfile is a library for reading (and in the future writing) Matlab β.matβ files.
and
docs.rs (https://docs.rs/plotters/latest/plotters/) plotters - Rust Plotters - A Rust drawing library focus on data plotting for both WASM and native applications π¦ππ
Currently, tre most time consuming is the plotting of the vertical profiles.....:
[2022-01-27 08:11:43] --> start displaying overlap function. [2022-01-27 08:11:45] --> finish. [2022-01-27 08:11:45] --> start displaying vertical profiles. [2022-01-27 08:19:12] --> finish. [2022-01-27 08:19:12] --> start displaying attenuated backscatter. [2022-01-27 08:19:38] --> finish.
Maybe this can be handled first @ulysses78 ?
Hi, I just did some code analysis in terms of data visualization speed. The bottleneck of the speed is on the python script, to be specific, matplotlib.savefig
.
The python script consumes more than 80% of the total running time. And savefig
, which was used for figure output, takes half of python running time (0.5 s per frame).
I did some research and can't find solution to improve it if we rely on matplotlib
. Because matplotlib
is optimized for high quality figures, not for execution speed (correct me if I'm wrong π ).
So it's good to really try a different data visualization scenario, rust
or whatever.
What about PyQtGraph as an alternative for matplotlib? Citation from https://www.pyqtgraph.org/ "Despite being written entirely in python, the library is very fast due to its heavy leverage of NumPy for number crunching and Qt's GraphicsView framework for fast display."
I started with this issue: https://github.com/PollyNET/Pollynet_Processing_Chain/issues/163 (Separate processing and plotting). In the future all the visualizations will be created with python only. At the same time I changed using pcolormesh by using imshow. Imshow is up to 8-10 times faster in plotting. As we all know, imshow has problems when there are gaps within the matrix. That's why I fill all the time-gaps in the matrix beforehand with nan-values (same for mask-matrix). This works very nice! The plots are looking very much the same, but will be created much faster.
Currently generating the plots from the results takes roughly 5min for 6h of observations [at least for the 13 channel LACROS system on rsd2]. Profiling reveals, that a single colorplot takes approximatly 4-6sec with the current setup.
Switching to an alternative matplotlib backend (quick test with gr) did not provide improvements.
Further ideas:
nx+1
andny+1
and might cause issues in the depol calibration gaps[::3,:]
, with 80 dpi and a figure height of 5inch not all datapoints make it to pixels anyways.Opinions? Suggestions?