aswinkarthik / csvdiff

A fast diff tool for comparing csv files
https://aswinkarthik.github.io/csvdiff/
MIT License
532 stars 57 forks source link

Runtime Error: cannot allocate memory #65

Open datatraveller1 opened 1 year ago

datatraveller1 commented 1 year ago

version: csvdiff version 1.4.0, running on MS Windows 11 command: csvdiff file1.csv file2.csv -p0,6,1,13,18 > csvdiff_result.csv After a few minutes the program crashes with: fatal error: runtime: cannot allocate memory

file1.csv: 8.0 GB, ~ 35.000.000 rows, 22 columns file2.csv: 9.5 GB, ~ 41.000.000 rows, 22 columns

I assume these files are too big for csvdiff?

gasperno commented 9 months ago

Not an expert on the project but I believe the data is loaded to Memory(RAM). If you give enough RAM this can scale bigger.

I have tested this for one of my projects with over 200M rows and it took around 43 GB RAM for the command to succeed. I tested under various scenarios of file changes with 10% of file changed vs. 50% of file changes.