Open asfimport opened 3 years ago
Antoine Pitrou / @pitrou:
In both cases (CSV and JSON) this can probably be added to ReadOptions
.
Supun Kamburugamuva: What would be a good option name for this?
One option would be
read_ahead
But if we introduce this do we need to change all the readers?
One other option would be not to read ahead if
use_threads = false
But this option is specifically for CPU threads.
Antoine Pitrou / @pitrou:
use_readahead = true
would sound good to me.
Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.
We are compiling Arrow C++ to WebAssembly and ran into the following issue with the CSV reader:
Browsers became very picky about the use of SharedArrayBuffers after the events around Spectre and Meltdown.
As a result, you have to compile Arrow to WebAssembly without threads if you don't want to run your website with very strict cross-origin isolation.
Unfortunately, the CSV reader seems to always spawn a thread for the read-ahead in both, the SerialStreamingReader and the SerialTableReader independent of whether use_threads is set.
Right now, this effectively means that you cannot use the CSV (and JSON) readers in threadless WebAssembly builds.
https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839
https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913
Reporter: Andre Kohn
PRs and other links:
Note: This issue was originally created as ARROW-12629. Please see the migration documentation for further details.