Some operations are memory hogs when operating on large results. For example, diff-reduce starts with one pandas DataFrames for all of the results, and gradually builds up another DataFrame with a subset of those results. When the CSV file for the full results is several G in size, this ends up using a lot of RAM.
It is probably worth breaking the results into chunks where possible, and writing out to disk. So, for example, diff-reduce could append each processed group of results to a file as CSV rather than keeping them in memory.
Some operations are memory hogs when operating on large results. For example, diff-reduce starts with one pandas DataFrames for all of the results, and gradually builds up another DataFrame with a subset of those results. When the CSV file for the full results is several G in size, this ends up using a lot of RAM.
It is probably worth breaking the results into chunks where possible, and writing out to disk. So, for example, diff-reduce could append each processed group of results to a file as CSV rather than keeping them in memory.