cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
126 stars 35 forks source link

Reading CSV: Improve performance of internal dataFromParser() #343

Closed cfsimplicity closed 10 months ago

cfsimplicity commented 10 months ago

[Extracted from issue #340 raised by @bdw429s ]

In my test file, I had 6 million rows The ~other~ really inefficient thing is the inner loop in the csv.dataFromParser() method. There's no need to loop over every value to build the array. This turned my 32 column csv from 6 million iterations to 192 million iterations of that loop and the overhead of Java reflection really adds up there.

You can change

    for( var value in values ){
        columnNumber++;
        result.maxColumnCount = Max( result.maxColumnCount, columnNumber );
        row.Append( value );
    } 

to just be

    result.maxColumnCount = Max( result.maxColumnCount, record.size() );
    arrayAppend( result.data, record.values() );

values() returns a native Java array which is going to be lighter weight than a CF array or Java List anyway,

cfsimplicity commented 10 months ago

@bdw429s Great suggestion and easy to apply to the current method.

values() returns a native Java array which is going to be lighter weight than a CF array or Java List anyway,

Unfortunatey that causes tests of related parts of the library to fail in ACF because of the change to a native java string array.

However my experiments seem to show that sticking with record.toList() is no slower. I'm still seeing processing times halved or better.

bdw429s commented 10 months ago

fail in ACF

Boo on Adobe! Can you post what the failures were so I can create tickets for Adobe to fix. Lucee does a pretty good job of handing native Java arrays.

cfsimplicity commented 10 months ago

@bdw429s It seems to be down to QueryNew() on ACF which won't recognize native arrays as data.

querynew-acf

As you say, Lucee has no problem.

querynew-lucee

For now, I've added branching to allow Lucee at least to benefit: https://github.com/cfsimplicity/spreadsheet-cfml/commit/f1151afe5ca7888f5bf150a11b3068fe24084189