BaRatin-tools / BaRatinAGE

BaRatin Advanced Graphical Environment
GNU General Public License v3.0
4 stars 0 forks source link

handle long limnigraph regarding speed (loading/saving), memory usage and file sizes #52

Open IvanHeriver opened 4 months ago

IvanHeriver commented 4 months ago

Currently large limnigraph with uncertainties result in:

I had a project where a long limnigraph (more than 44000 time steps) which was used to compute two differents hydrographs:

I think these issues call for the following measures:

Let me know if other options to better manage long limnigraph within BaRatinAGE can be implemented.

benRenard commented 4 months ago

related to issue #40

I agree with your last point that Q(t) spaghettis are the main problem. In v2 there was an option to enable/disable the saving of spaghettis within the project bar.zip file, so it's probably an approach we could re-implement. Maybe it could be made more flexible, by e.g. asking the user a maximum file size above which the saving of spaghettis is disabled. Or at the contrary it could be made less flexible, by never saving the spaghettis (only the envelops). But in any case, a useful feature would be to ask the user if she/he wishes to export the spaghettis: this way it's still re-usable downstream BaRatinAGE, but it doesn't bloat the bar.zip project file.

There are a few tricks in issue #40 to improve memory or CPU time, but I'm not sure it should be the job of BaRatinAGE to implement them, and in any case there will always be instances with massive spaghetti files, so we should find a way to handle it properly.

IvanHeriver commented 4 months ago

Exporting spagettis of a prediction in BaRatinAGE v2 is not possible, right? The user had to go look for it in the bar.zip file. In v3, this is also currently not possible.

The approaches you suggest are interesting but it seems a bit overly complicated for a feature not many people use (I might be wrong).

Here is another simpler idea:

  1. spaghettis of predictions are simply never saved (exept if it has only one column, which is the case of maxpost prediction) in the project file
  2. in the RC and Qt panels, add a result tab with a button to download the spagettis of each prediction experiment
  3. if the project is reopened, spaghettis are lost, buttons are greyed out, and a message says: "to retrieve the prediction samples, BaM needs to be rerun" or something similar.

Point 1 could be quite simply implemented for version 3.0.0 and point 2 and 3 be implemented in future version.

I tested not zipping spagetti files, resutling files are way samller and managable (e.g. from 26Mb to 3.5Mb).

However, this doesn't fix the big project file issue when there are long time series with stage errors because the stage error matrix is still saved. Maybe a possible fix would be to use a seed with the random number generator and save the seed instead of the matrix. But the problem will then be the project loading time (it is very intensive to build such an error matrix).

benRenard commented 4 months ago

OK with your approach for RC and Qt spaghettis.

Why is it so important to save the stage spaghettis? Couldn't we use the same approach and not save them, while offering a way to download them if the user wishes?

In particular, I don't understand why you need to generate the stage spaghettis when loading the project: in my eyes they only need to be generated before performing a prediction experiment that requires it (e.g. total uncertainty on Qt). And even if it is a bit intensive, I don't think it's as intensive as passing them through the RC equation to compute Qt spaghettis.

IvanHeriver commented 4 months ago

Currently stage spaghettis are computed when loading a stage time series to (1) compute the uncertainty envelop, (2) be visible (and exportable) in a table within the limnigraph panel and (3) be used any time a discharge time series is computed.

I chose the approach because I wanted the sampled stage errors to remain the same after saving and reloading the project and for all the children discharge time series. However, as I stated in my previous comment, using a seed (probably possible but I haven't checked) might solve this particular issue.

Computing the stage errors only when required (e.g. to compute Qt spaghettis, or if the user request the spaghettis to export them) as you suggest seems to be the way to go.

It is indeed less intensive than computing Qt spaghettis but it can still take a few seconds. But I find it less problematic than taking time on project load (which is already pretty slow) or to store the entire matrix in the project file which takes a lot of space AND is slow to unzip and read.

Changes to how stage errors are managed within BaRatinAGE is not that straightforward I think. But it might still be worth the effort for version 3.0.0 since the large file project file are significant issue in my opinion. It might also affect project file structure (and I know how painful it can be to handle several file versions).