acturtle / cashflower

An open-source Python framework for actuarial cash flow models
https://cashflower.acturtle.com
MIT License
38 stars 9 forks source link

Adaptive batch_size #369

Open zchmielewska opened 4 months ago

zchmielewska commented 4 months ago

Currently, we calculate batch_size (to avoid memory error) only once.

If memory error varies, maybe we should recalculate it after each batch.

while batch_start < range_end:
    # Calculate the batch size based on available memory
    batch_size = self.calculate_batch_size(num_output_columns)

    # Process the batch
    batch_end = min(batch_start + batch_size, range_end)
    results = calculate_model_point_partial(batch_start, batch_end)

    # Update the batch start and end indices
    batch_start = batch_end

    # ... process the results ...

How to profile memory usage?

import psutil

# Get the current memory usage
mem_usage = psutil.virtual_memory().used

# Log the memory usage
print(f"Memory usage: {mem_usage / (1024 * 1024):.2f} MB")

or

import psutil

# ...

while batch_start < range_end:
    # Calculate the batch size based on available memory
    batch_size = self.calculate_batch_size(num_output_columns)

    # Log memory usage before processing the batch
    mem_usage_before = psutil.virtual_memory().used

    # Process the batch
    batch_end = min(batch_start + batch_size, range_end)
    results = calculate_model_point_partial(batch_start, batch_end)

    # Log memory usage after processing the batch
    mem_usage_after = psutil.virtual_memory().used

    # Log the difference in memory usage
    mem_usage_diff = mem_usage_after - mem_usage_before
    print(f"Memory usage increased by {mem_usage_diff} bytes")

    # ... process the results ...

We can use this code to understand better memory usage.