will allow a UDF passed to the arguments as filter to be used to return a boolean to skip rows that we don't want from the CSV to save memory. For example, in the 6 million rows CSV I'm using right now for a client, it turns out I only actually needed about 1.6 million of those rows. It still takes about the same amount of time to iterate over the file, but the overall memory used is lower since we don't add those rows to my final array of arrays.
Inside of the CSV iterator loop, something like this
will allow a UDF passed to the arguments as
filter
to be used to return a boolean to skip rows that we don't want from the CSV to save memory. For example, in the 6 million rows CSV I'm using right now for a client, it turns out I only actually needed about 1.6 million of those rows. It still takes about the same amount of time to iterate over the file, but the overall memory used is lower since we don't add those rows to my final array of arrays.