NREL / flasc

A rich floris-driven suite for SCADA analysis
https://nrel.github.io/flasc/
BSD 3-Clause "New" or "Revised" License
31 stars 18 forks source link

Feature/port to polars 4 #107

Closed paulf81 closed 1 year ago

paulf81 commented 1 year ago

Port to polars (4th try)

Feature or improvement description This is a final, final, reboot of the port_to_polars efforts, initially begin in Pull Request #81 and updated in Pull Request #97, (and then pull request #102) here following the decision made in discussion #98 that we keep the assumption that pandas dataframe defined in the current methods (with columns pow_000, pow_001, and time) will be kept the same, and conversion to polars can happen within certain functions. Here I also adjust to discussion Issue #104

Unlike the previous versions I also split some things apart I think in a way that adds more clarity.

energy_ratio_utilities.py contains helper functions other modules could need energy_ratio.py includes the main working functions of energy ratio calculation energy_ratio_result.py defines a class meant to hold the energy ratio results in a convenient way and provide clear plotting functions.

Unlike before I've included methods for overlapping bins and now you can the tests energy_ratio_test.py that I've defined for polars passes giving identical results. In general I've confirmed that the energy ratio result values match all previous results except a small but intentional change in how rebalancing is done. This change only manifests when using noisy data and more than one set of data (for example baseline and wakesteering) and is still generally small. But for this reason the tests pass with precision of 4 decimal places.

I'm temporary leaving the polars implementations split out during development but hope to move quickly to deleting the old energy_ratio codes once all surrounding codes are edited to use the new functions. Should hopefully go quick.

Finally you'll also see I'm playing with default visualizations, happy for any and all suggestions here too!

In connection with this effort is a small repo https://github.com/paulf81/flasc_metrics (based on https://github.com/rafmudaf/floris_metrics) which can be used to confirm that even with a pandas->polars->pandas round trip the functions are still sped up. The new folder timing_tests provides functions to that repo.

A first result is promising, converting the energy ratio with bootstrapping to the polars-based method reduces the average time to compute an N=20 bootstrap energy ratio over the artificial data set from 11.3344 seconds to 1.0762 seconds.

Related issue, if one exists Issue #80

Impacted areas of the software energy_ratio (this list may grow)

paulf81 commented 1 year ago

Updated the test code in https://github.com/paulf81/flasc_metrics and still seeing a 10x reduction in time taken in computation

Bartdoekemeijer commented 1 year ago

Great stuff! Let me know when it makes sense for me to review.

paulf81 commented 1 year ago

With luck you should be able to review tomorrow, thank you @Bartdoekemeijer !

misi9170 commented 1 year ago

Great stuff! Let me know when it makes sense for me to review.

@Bartdoekemeijer At this point you can go ahead and review! The artificial data examples are not yet updated (I'll be working on those tomorrow), but the Smarteole examples are ready to roll and we don't expect any further fundamental changes to the energy_ratio/ code.

paulf81 commented 1 year ago

@Bartdoekemeijer ok, I think we're ready to merge on our end. If you want to have a look around, all examples updated to new syntax. We've also added this notebook as quicker tour: https://github.com/paulf81/flasc/blob/feature/port_to_polars_4/examples_artificial_data/energy_ratio/00_demo_energy_ratio_syntax.ipynb