ecmwf / ecpoint-calibrate

Interactive GUI (developed in Python) for calibration and conditional verification of numerical weather prediction model outputs.
GNU General Public License v3.0
20 stars 8 forks source link

The software is extremely slow when you already have many weather types (~150-200) #168

Closed EstiGascon closed 3 years ago

EstiGascon commented 3 years ago

I found the process of splitting in different weather types extremely slow. So the more weather types I have, the slower is the splitting process. I am not working with the K-S test at all, just clicking in each branch and divide one by one.

To reproduce it, I have saved an example of the ascii table with all the input data from the predictors and an example of the file BreakpointsWT.csv to be upload for testing. You will see that many breakpoints have already been created and it should be slow as well for you if the problem is related to the software (and not my computer! But I have checked and all the processes in my computer are using only the 10% of the CPU, so it should be OK). The files are in this Dropbox shared folder (I had problems to load them here):

https://www.dropbox.com/sh/lgmdunzbf6m676o/AABoLwMnkgwkaJUI2mZkY78Xa?dl=0

Try to do this test:

Fatima is testing it as well, to see if it is a common problem. I also tested closing the software and starting it again from the last step but it is still slow. Thanks

FatimaPillosu commented 3 years ago

Can we try to see whether the problem persists if the DT is created using a parquet file instead of a csv file?

EstiGascon commented 3 years ago

Yes, I just tested the same issue with the same database but in parquet format, and it is really slow as well (tested in version 0.24.0)

onyb commented 3 years ago

Update: Reproduced it on my side as well. I confirm that the slowness is not because of the size of the ASCII table, but because of the size of the decision tree.

I think the root cause is a technical limitation in React (the framework we use for rendering UI elements on the screen). I have several ideas on how we can improve it - one of them is to not display the threshold breakpoints table when it becomes huge, thereby having some performance gain for not re-rendering the contents on every UI interaction.

onyb commented 3 years ago

Update: I can confirm that the issue is actually with the tree, and NOT with the breakpoints table contrary to what I previously thought.

I'm currently trying to cut back on fancy things like animations, dynamic layouts, responsiveness, etc., in order to get a performance gain. The tree won't look as good, but it might be usable.

onyb commented 3 years ago

Fixed in v0.26.0. I tested the software using the uploaded ASCII table and breakpoints, and the performance was much better. Closing this issue, but feel free to reopen if there are any unaddressed issues.