Closed DavideMessinaARS closed 3 years ago
Running different functions on two vectors of 1M elements, one integer and the other character, gives:
base_str = base::as.Date(chr_dates),
basef_str = base::as.Date(chr_dates, fmt),
lub1_str = lubridate::as_date(chr_dates),
lub2_str = lubridate::ymd(chr_dates),
lub2_int = lubridate::ymd(num_dates),
idat_str = data.table::as.IDate(chr_dates),
idatf_str = data.table::as.IDate(chr_dates, fmt),
fast_ = fasttime::fastPOSIXct(chr_dates),
fastd = as.Date(fasttime::fastPOSIXct(chr_dates))
expression | min | median | itr/sec | mem_alloc | gc/sec | n_itr | n_gc | total_time |
---|---|---|---|---|---|---|---|---|
base_str | 11.16s | 11.51s | 0.0865606 | 118.3MB | 0.1731212 | 10 | 20 | 1.92m |
basef_str | 1.03s | 1.12s | 0.8881133 | 83.9MB | 0.8881133 | 10 | 10 | 11.26s |
lub1_str | 386.46ms | 425.67ms | 2.3032683 | 128.3MB | 3.4549025 | 10 | 15 | 4.34s |
lub2_str | 398.74ms | 490.69ms | 1.9532972 | 134.6MB | 3.3206053 | 10 | 17 | 5.12s |
lub2_int | 1.7s | 1.82s | 0.5359279 | 241.4MB | 1.5005982 | 10 | 28 | 18.66s |
idat_str | 10.94s | 11.2s | 0.0892033 | 122.1MB | 0.3568132 | 3 | 12 | 33.63s |
idatf_str | 965.49ms | 992.24ms | 0.9820532 | 87.7MB | 1.4730798 | 4 | 6 | 4.07s |
fast_ | 88.63ms | 102.25ms | 9.5453584 | 15.3MB | 0.0000000 | 10 | 0 | 1.05s |
fastd | 108.85ms | 120.77ms | 8.1918708 | 30.5MB | 2.0479677 | 8 | 2 | 976.58ms |
As I've expected running on a real dataset gives a much different result.
On improving with respect to the real bottlenecks see https://github.com/ARS-toscana/CreateSpells/pull/5/
This is no longer a relevant topic so I'll close it.
I've run the function on a dataset created by combining OBSERVATION_PERSON 10000 times. (A final length of around 30M records)
The general results are
The result from profiling the code are
Since ymd() is the function with the most impact right now, it make sense to explore other alternatives. (For now at least, since I haven't run CreateSpell on a real dataset)