Open coroa opened 4 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.65%. Comparing base (
f0c8457
) to head (d82e97e
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hey @coroa, thanks for your PR. According to the profiler the lazy operation is taking very long.
import pypsa
import psutil
import time
import threading
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Flag to control the monitoring loop
stop_monitoring = False
# List to store memory usage values
memory_values = []
# Function to monitor memory usage
def monitor_memory_usage(interval=0.1):
global stop_monitoring
global memory_values
process = psutil.Process()
while not stop_monitoring:
mem_info = process.memory_info()
memory_values.append(mem_info.rss / 1024 ** 2) # Store memory in MB
time.sleep(interval)
# Start monitoring memory usage in a separate thread
monitor_thread = threading.Thread(target=monitor_memory_usage)
monitor_thread.daemon = True # Daemonize thread
monitor_thread.start()
# Your original code
n = pypsa.Network(".../pypsa-eur/results/solver-io/prenetworks/elec_s_128_lv1.5__Co2L0-25H-T-H-B-I-A-solar+p3-dist1_2050.nc")
m = n.optimize.create_model()
m.to_file("test.lp", io_api="lp-polars")
# Stop monitoring
stop_monitoring = True
monitor_thread.join()
# Plotting the memory usage
plt.plot(memory_values)
plt.xlabel('Time (in 0.1s intervals)')
plt.ylabel('Memory Usage (MB)')
plt.title('Memory Usage Over Time')
plt.savefig("mem-polars-non-lazy.png")
print(max(memory_values))
Interesting that there is no memory savings in either case compared to the other two.
Thanks for the profiling. Very disappointing.
It's possible that .values.reshape(-1)
is not zero-copy.
The lazy version has to do everything at least twice, since the check_nulls
already needs to eval everything (that could be improved). I don't know why factor 4, though.
I'll try to debug a bit around to find out where we are scooping up this memory use. Any particular xarray version to focus on? @FabianHofmann
I'll try to debug a bit around to find out where we are scooping up this memory use. Any particular xarray version to focus on? @FabianHofmann
Cool, but no rush, seems to be stable for the moment. I think it should be independent of the xarray version.
Tests run fine. The extra pyarrow dependency should not hurt, since arrow is already a requirement for polars (and soon also pandas), while pandas is only the python frontend in addition.
We should check each of the invocations of
write_lazyframe
thatexplain(streamable=True)
shows it can actually run the streaming pipeline.If you decide to merge, please squash (the history is ugly :))