SwissClinicalTrialOrganisation / secuTrialR

Handling of data from the clinical data management system secuTrial
https://swissclinicaltrialorganisation.github.io/secuTrialR/
Other
8 stars 12 forks source link

Increase reading speed of for read_secuTrial() #204

Open PatrickRWright opened 4 years ago

PatrickRWright commented 4 years ago

Is your feature request related to a problem? Please describe. Very big exports (i.e. tens of thousands of entries) need long to read. Maybe its possible to boost the performance.

aghaynes commented 4 years ago

I tried a little profiling... It seems to be converting the dates that's primarily causing the lag (at least in the dataset I've looked at)

image

aghaynes commented 4 years ago

Specifically, it's a merge in .convert_dates image

PatrickRWright commented 4 years ago

What's the tool you are using there? Looks super useful.

aghaynes commented 4 years ago

profvis - it integrates with RStudio's IDE see here for tutorial on RStudio's support forum

aghaynes commented 4 years ago
devtools::load_all()
profvis::profvis(read_secuTrial(path_to_export))

I find it easier to look at the Data tab image

PatrickRWright commented 4 years ago

image

PatrickRWright commented 4 years ago

We could benchmark the tidyverse reading functions to see if its worthwhile switching.

PatrickRWright commented 4 years ago

In the spirit of structured procrastination I prepared a small benchmark: https://gist.github.com/PatrickRWright/4ed5d4e5b5aed03b7a1aa5b593dd9b64

readr is faster but its not exactly light speed either.

aghaynes commented 4 years ago

There is also data.table::fread and vroom::vroom which would be worth looking at. vroom is apparently the fastest, although I've never used it... (screenshot from the vroom readme)

image

PatrickRWright commented 4 years ago

If it were just the numbers I agree. Lets have a look at the dependency consequences it has. The speedup looks pretty impressive though.