Open ssolson opened 8 months ago
Not sure off the top of my head... On first glance yes this is surprising. The rex resource classes should not be doing anything too fancy here. Some ideas (none of which i am fully convinced by):
windx.__exit__()
is clearing the cachewith
statement takes extra time windx.__init__
or windx.time_index
are taking long Ideas for a more direct comparison:
cProfile
or something similarWindResource
class instead of WindX
with
statement in both casesWindResource
class, try something simple outside of the property like pd.to_datetime(WindResource['time_index'].astype(str)
(I think this will work, maybe not)Thanks for the suggestions Grant.
Looking at
The use of WindResource
is not mentioned. To check my understanding one should still access these resources via WindX in production but the WindResource
class would be slightly more optimized and could help us figure out if there is overhead in using the WindX class?
The "extraction" classes add some quality-of-life features (e.g., lat/lon lookup and SAM dataframe extraction) but are ultimately just wrappers of the base resource classes (e.g., WindResource). We typically advertise the extraction classes to the public because of the nice features but the base resource classes have less overhead.
Grant,
Using WindResource
and a with
in both methods did not improve the results (current script below).
A formal code profile is outside the scope of my time to solve/ help with this issue. I was not exactly sure what you meant with suggestion 4 above so I did not try it but I would not expect it to make much of a difference.
from rex import WindX, WindResource
import h5pyd
import pandas as pd
import time
def measure_hsds_execution_time():
start_time = time.time()
with h5pyd.File("/nrel/wtk/conus/wtk_conus_2014.h5", 'r') as f:
time_index = pd.to_datetime(f['time_index'][...].astype(str))
print(time_index)
return time.time() - start_time
def measure_rex_execution_time():
start_time = time.time()
wtk_file = '/nrel/wtk/conus/wtk_conus_2014.h5'
with WindResource(wtk_file, hsds=True) as f:
time_index = f.time_index
print(time_index)
return time.time() - start_time
# Function to calculate min, max, and average times
def calculate_stats(times):
min_time = min(times)
max_time = max(times)
avg_time = sum(times) / len(times)
return min_time, max_time, avg_time
# Pause for 5 seconds between calls
def wait():
time.sleep(5)
# Running the script 5 times and recording execution times
hsds_execution_times = []
rex_execution_times = []
for _ in range(5):
hsds_execution_times.append(measure_hsds_execution_time())
wait()
rex_execution_times.append(measure_rex_execution_time())
wait()
# Calculating stats for each method
hsds_min, hsds_max, hsds_avg = calculate_stats(hsds_execution_times)
rex_min, rex_max, rex_avg = calculate_stats(rex_execution_times)
# Printing comparison
print("\nComparison of Execution Times (in seconds):\n")
print(f"HSDS Method:")
print(f" Minimum Time: {hsds_min:.3f}")
print(f" Maximum Time: {hsds_max:.3f}")
print(f" Average Time: {hsds_avg:.3f}")
print(f"\nRex Method:")
print(f" Minimum Time: {rex_min:.3f}")
print(f" Maximum Time: {rex_max:.3f}")
print(f" Average Time: {rex_avg:.3f}")
# Comparing the average times and calculating the speed difference
if hsds_avg < rex_avg:
speed_difference = rex_avg / hsds_avg
print(f"\nOn average, the HSDS method is faster by a factor of {speed_difference:.2f}.")
else:
speed_difference = hsds_avg / rex_avg
print(f"\nOn average, the Rex method is faster by a factor of {speed_difference:.2f}.")
okay well thanks for the heads up about the possible performance issues!
I wrote the script at the bottom of this issue to spot check the performance of using hsds vs rex when I noticed rex taking significantly longer to run than hsds for the same call.
This issue is really just a question to if you guys have an idea as to why this is or if this is to be expected for some reason?
Comparison of Execution Times (in seconds):
On average, the HSDS method is faster by a factor of 7.62.
HSDS Method:
Rex Method: