Closed rabutler closed 7 years ago
Reading in an entire rdf file (153 MB) resulted in 22,226,721 elements and 170.4 Mb in R.
Using readLines this takes 13.32 seconds (12.44 user, 0.46 system).
Using data.table::fread('file.rdf', sep = '\t') this takes 2.4 seconds (2.28 user, 0.09 system).
read.rdf takes 38.44 s (37/1.36) on the same file. So likely we could reduce this from 38.44 to 27.52 s by switching to fread.
Commit 45e4e56e0de50500ccf6844c6bdda1ef4766118a started to address this, with very minor improvements for large files, and slower reads for small files. read.rdf2
uses data.table::fread
.
For a 156 MB file:
User | System | Elapsed | |
---|---|---|---|
read.rdf |
36.24 | 1.31 | 38.75 |
read.rdf2 |
37.25 | 0.11 | 37.66 |
For a 0.9 MB file:
User | System | Elapsed | |
---|---|---|---|
read.rdf |
0.51 | 0.03 | 0.55 |
read.rdf2 |
0.72 | 0.00 | 0.72 |
b01228817f0e1c28b395b4b9d4a08ef34931e314 converted from data frame to matrix before parsing everything. The comparisons are now:
User | System | Elapsed | |
---|---|---|---|
read.rdf |
36.24 | 1.31 | 38.75 |
read.rdf2 |
26.14 | 0.82 | 28.98 |
For a 0.9 MB file:
User | System | Elapsed | |
---|---|---|---|
read.rdf |
0.51 | 0.03 | 0.55 |
read.rdf2 |
0.44 | 0.00 | 0.44 |
Don't think there are anymore obvious enhancements to speed it up at this point.
Would using fread or similar speed up the reading of the rdf file?