cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
126 stars 35 forks source link

Reading CSV: Allow filter UDF to limit the CSV rows which are included in the final result to save on memory by filtering them up front. #342

Closed bdw429s closed 10 months ago

bdw429s commented 10 months ago

Inside of the CSV iterator loop, something like this

    if( !isNull( filter ) ) {
        if( !filter( values ) )
            continue;
    }

will allow a UDF passed to the arguments as filter to be used to return a boolean to skip rows that we don't want from the CSV to save memory. For example, in the 6 million rows CSV I'm using right now for a client, it turns out I only actually needed about 1.6 million of those rows. It still takes about the same amount of time to iterate over the file, but the overall memory used is lower since we don't add those rows to my final array of arrays.

cfsimplicity commented 10 months ago

https://github.com/cfsimplicity/spreadsheet-cfml/blob/e0b0f38d72e489de3fb3037f80cd7f14f7723f7b/test/specs/readCsv.cfm#L106

cfsimplicity commented 10 months ago

See potential issue as result of using .values(): https://github.com/cfsimplicity/spreadsheet-cfml/issues/341#issuecomment-1798790300