After trying to save a 6 column DataFrame to a 5GB CSV file, I had to kill the Julia session after a few minutes of it heavily swapping on my laptop with 16GB of memory.
As pointed out here [1], the memory usage can be significantly reduce by making the iteration type-stable with the Tables.columntable function. I was able to write my file in a few seconds after making that change.
Since it's quite common to work with narrow but long DataFrames, shouldn't CSV.write just check the dimensions and decide when to convert to a type stable table?
After trying to save a 6 column DataFrame to a 5GB CSV file, I had to kill the Julia session after a few minutes of it heavily swapping on my laptop with 16GB of memory.
As pointed out here [1], the memory usage can be significantly reduce by making the iteration type-stable with the Tables.columntable function. I was able to write my file in a few seconds after making that change.
Since it's quite common to work with narrow but long DataFrames, shouldn't CSV.write just check the dimensions and decide when to convert to a type stable table?
[1] https://stackoverflow.com/questions/65584387/julia-csv-write-very-memory-inefficient