Closed pkarman closed 8 years ago
An important caveat with reading CSV as stream: the :force_utf8
feature does not work as currently implemented. My recommendation is to require all CSV files be in UTF-8 format prior to import, possibly offering some easy docs/script to convert encodings.
Going to open a new PR against a branch directly on this repo.
Based off #294 and #94
This PR cuts indexing time significantly when run on a machine with multiple processors, at the expense of more memory usage. To mitigate the additional memory use that comes with parallel (forked) processes, CSV files are read via IO stream, a row at a time, rather than being slurped entirely into memory and parsed as a String.
For reference, the current import consumes about 250MB for the single process.
Example stats and usage: