cboettig / neonstore

:package: A local content-based storage system for NEON data
https://cboettig.github.io/neonstore
Other
8 stars 5 forks source link

make vroom optional? #24

Open cboettig opened 4 years ago

cboettig commented 4 years ago

The need to pull in metadata from filenames has meant that we have been forced to loose much of the efficiency we had gained in previously from vroom's ability to read in a vector of files that conform to the same schema (columns). Under these circumstances (and with altrep off), for the often small individual raw files we have in NEON, vroom isn't giving us that much performance over base methods.

Meanwhile, the database backend means that we have an even faster way to access tables that have been imported into the database, so parsing the text file has become more of a one-off step, making csv-parsing speed not as crucial. vroom is probably the heaviest dependency currently (in terms of indirect dependencies), so this would make the required footprint a good bit lighter.

(Note that the need to add columns from file-name metadata is also a huge slow-down for duckdb, otherwise duckdb_read_csv() could read a vector of files with matching schema very quickly too!)

cboettig commented 3 years ago

Not truly related, but note that:

vctrs::vec_rbind(!!!list_of_results)

is equivalent to dplyr::bind_rows and may be more efficient than oldschool do.call(rbind, ... , /ht @jimhester