Closed jvlmdr closed 4 years ago
I noticed that iteratively selecting rows from the dataframe was a serious bottleneck.
It looks like someone was already investigating this. I have removed the use of the cached analysis and the lines which computed timings.
I isolated the code for extracting counts and added a benchmark (and a dependency on pytest-benchmark).
pytest-benchmark
Before:
--------------------------------------------------------- benchmark: 1 tests --------------------------------------------------------- Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations -------------------------------------------------------------------------------------------------------------------------------------- test_benchmark_extract_counts_from_df_map 15.4156 16.1166 15.6762 0.3507 15.4331 0.6114 1;0 0.0638 5 1 --------------------------------------------------------------------------------------------------------------------------------------
After (time in ms not s):
------------------------------------------------------------ benchmark: 1 tests ------------------------------------------------------------ Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations -------------------------------------------------------------------------------------------------------------------------------------------- test_benchmark_extract_counts_from_df_map 146.5993 209.5080 175.8946 22.3510 174.9131 17.7610 2;0 5.6852 5 1 --------------------------------------------------------------------------------------------------------------------------------------------
Merged the non-rebased one, I guess this is then obsolete?
Yep! Thanks
I noticed that iteratively selecting rows from the dataframe was a serious bottleneck.
It looks like someone was already investigating this. I have removed the use of the cached analysis and the lines which computed timings.
I isolated the code for extracting counts and added a benchmark (and a dependency on
pytest-benchmark
).Before:
After (time in ms not s):