awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
364 stars 79 forks source link

Improve speed using stringi package #440

Open noamross opened 5 years ago

noamross commented 5 years ago

This is an incomplete PR for feedback before I continue with it. It replaces a number of calls to paste0() and paste() with stringi::stri_join(), which is approximately 4x faster.

This came up in a project of mine where I needed to write a create a large number of styles and apply them to a large number of cells (in many workbooks!). We found a speed bottleneck in the paste() functions at https://github.com/awalker89/openxlsx/blob/master/R/WorkbookClass.R#L2105-L2106 , so these changes helped speed up the process and my team is using this fork.

If you are interested in this PR, I'd be happy to go through the rest of the package and replace calls (paste(), gsub(), grepl(), date formatting, etc.) that could be accelerated with their stringi equivalents, which in my experience are 2-4x faster.

stringi is a mixed bag as a dependency - it takes a while to install from source because it contains all of unicode, but it has no non-base dependencies and many people have it already as it is a tidyverse dependency. In my experience it is also quite stable.

GegznaV commented 4 years ago

I think speed gives value. @noamross, Can this pull request be directed to the new home of openxlsx: https://github.com/ycphs/openxlsx/ (see ycphs/openxlsx#2)

ycphs commented 4 years ago

Unfortunately the the checks fail. @noamross: Please could you check once more the. I found a problem in the saveWorkbook method.

The examples and the tests are showing the same failure.