Improve int/char list transformation to date format

DavideMessinaARS commented 3 years ago

I've run the function on a dataset created by combining OBSERVATION_PERSON 10000 times. (A final length of around 30M records)

The general results are

expression	min	median	itr/sec	mem_alloc	gc/sec	n_itr	n_gc	total_time
Modified_createspell()	1.86m	1.92m	0.008679	29.7GB	0.085054	5	49	9.6m

The result from profiling the code are

function	time
total	78.45s
lubridate::ymd	54.44s
[.data.table	11.49s
order	2.45s
others	...

Since ymd() is the function with the most impact right now, it make sense to explore other alternatives. (For now at least, since I haven't run CreateSpell on a real dataset)

DavideMessinaARS commented 3 years ago

Running different functions on two vectors of 1M elements, one integer and the other character, gives:

base_str = base::as.Date(chr_dates),
basef_str = base::as.Date(chr_dates, fmt),
lub1_str = lubridate::as_date(chr_dates),
lub2_str = lubridate::ymd(chr_dates),
lub2_int = lubridate::ymd(num_dates),
idat_str = data.table::as.IDate(chr_dates),
idatf_str = data.table::as.IDate(chr_dates, fmt),
fast_ = fasttime::fastPOSIXct(chr_dates),
fastd = as.Date(fasttime::fastPOSIXct(chr_dates))

expression	min	median	itr/sec	mem_alloc	gc/sec	n_itr	n_gc	total_time
base_str	11.16s	11.51s	0.0865606	118.3MB	0.1731212	10	20	1.92m
basef_str	1.03s	1.12s	0.8881133	83.9MB	0.8881133	10	10	11.26s
lub1_str	386.46ms	425.67ms	2.3032683	128.3MB	3.4549025	10	15	4.34s
lub2_str	398.74ms	490.69ms	1.9532972	134.6MB	3.3206053	10	17	5.12s
lub2_int	1.7s	1.82s	0.5359279	241.4MB	1.5005982	10	28	18.66s
idat_str	10.94s	11.2s	0.0892033	122.1MB	0.3568132	3	12	33.63s
idatf_str	965.49ms	992.24ms	0.9820532	87.7MB	1.4730798	4	6	4.07s
fast_	88.63ms	102.25ms	9.5453584	15.3MB	0.0000000	10	0	1.05s
fastd	108.85ms	120.77ms	8.1918708	30.5MB	2.0479677	8	2	976.58ms

DavideMessinaARS commented 3 years ago

As I've expected running on a real dataset gives a much different result.

On improving with respect to the real bottlenecks see https://github.com/ARS-toscana/CreateSpells/pull/5/

This is no longer a relevant topic so I'll close it.

ARS-toscana / CreateSpells

Improve int/char list transformation to date format #4