Open r2evans opened 1 year ago
I tried it with R on Ubuntu 22.04 and arrow installed from RSPM binary, and was able to read CSV successfully. (10GB RAM used) is it possible that this is a bug related to how arrow is installed or the OS?
R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
Platform: x86_64-pc-linux-gnu (64-bit)
> obj3 <- arrow::read_csv_arrow("ITPD_E_R02.csv", as_data_frame = FALSE)
> obj3
Table
72534869 rows x 13 columns
$exporter_iso3 <string>
$exporter_dynamic_code <string>
$exporter_name <string>
$importer_iso3 <string>
$importer_dynamic_code <string>
$importer_name <string>
$broad_sector <string>
$industry_id <int64>
$industry_descr <string>
$year <int64>
$trade <double>
$flag_mirror <int64>
$flag_zero <string>
Describe the bug, including details regarding any error messages, version, and platform.
Motived by https://stackoverflow.com/questions/75657380/readr-vs-data-table-different-results-on-fedora, I downloaded its sample data (https://www.usitc.gov/data/gravity/itpd_e/itpd_e_r02.zip) and read the CSV with various functions. I was able to read the file successfully (albeit slowly for most) using
utils::read.csv
,readr::read_csv
,data.table::fread
, andarrow::open_dataset(., format="csv")
, but when I tried this, my R crashed:(FYI, I do not have a
D:
drive, that must be compiled into the symbols.)I tried it again, same computer, new/fresh R process, same file, different error:
I tried upgrading arrow and it still fails:
The CSV file itself is 6.8GB and, once read into R, typically consumes 7GB+ of RAM. My system is Win11 22H2 (OS Build 22621.1265) with 64GB of RAM, running R inside emacs/ess.
For perspective, the data does not appear to contain anything cosmic:
(I recognize that data of this size should be (at least) opened lazily using
open_dataset
or converted to a better storage format, that's not the point of this issue.)Session info:
Component(s)
R