Open ramiromagno opened 10 months ago
@ramiromagno looking into those packages - any suggestion for higher speed performance on write? rcpp.simdjson has read functions but I wasn't seeing as much for writing.
You're right, it seems rcpp.simdjson only has read functions.
yyjsonr looks promising though.
Using my yyjson_switch branch with 2 cores and 16gb of ram in a container:
ae <- read_dataset_json(test_path("testdata", "ae.json"))
ae_100 <- dplyr::bind_rows(rep(list(ae),100000))
ds_metadata <- dplyr::bind_rows(purrr::map(ae, \(x) attributes(x)))
ds_metadata['name'] <- names(ae)
ds_json <-
dataset_json(ae_100, "SDTM.AE", "AE", "Adverse Events", ds_metadata)
start <- Sys.time()
write_dataset_json(ds_json, file="test.json")
print(Sys.time()-start)
Time difference of 42.58133 secs
In total that's 7,400,000 rows and 37 columns. Total output size is 1.8GB
A quick test against the current dev branch using jsonlite had a time of 2.141051 mins.
Feature Idea
Depend on rcpp.simdjson or yyjsonr, instead of jsonlite. The link contains a nice benchmark.
Relevant Input
No response
Relevant Output
No response
Reproducible Example/Pseudo Code
No response