Merck / r2rtf

Easily Create Production-Ready Rich Text Format (RTF) Table and Figure
https://merck.github.io/r2rtf
GNU General Public License v3.0
76 stars 19 forks source link

strings dependency #216

Closed elong0527 closed 5 months ago

elong0527 commented 5 months ago

In the convert function, I need to replace strings based on a mapping rule.

Currently, I rely on stringi in the convert function. Is there a efficient way in using base R?

https://github.com/Merck/r2rtf/blob/c55513d324c944a818573b4fb24062b6eda81458/R/conversion.R#L107

Here is an example mappings I would need to transfer from left side to the right side in a vector of strings.

  char_rtf <- c(
    "^" = "\\super ",
    "_" = "\\sub ",
    ">=" = "\\geq ",
    "<=" = "\\leq ",
  )

cc: @nanxstats @yihui

yihui commented 5 months ago

I think what you are doing currently is quite reasonable: https://github.com/Merck/r2rtf/blob/c55513d324c944a818573b4fb24062b6eda81458/R/conversion.R#L106-L114

I wouldn't worry too much about the performance of gsub() (with a for-loop). You can do some benchmarking for convert(load_stringi = TRUE) vs convert(load_stringi = FALSE) to have a clearer idea. I guess the latter is very likely to be slower, but if practically the difference is 10ms (gsub()) vs 1ms (stringi), I won't bother thinking about it at all and will just use gsub(). Of course, the time depends on the size of the character vector text. I guess the time difference won't be noticeable if length(text) is relatively small (e.g., less than 1000).

elong0527 commented 5 months ago

Thanks for the suggestion, I am closing the issue with your blessing.

I added it because of a use case in clinical trial that requires to save all safety data as a listing in RTF format for EU (called ICH listing).

For a large trial, it can goes to > 100k records with > 10k pages in a RTF files.

yihui commented 5 months ago

For 100k records, the time difference may be noticeable but I guess gsub() should take no longer than one second, which should be fine. Again, benchmarking will give you a clearer idea.

BTW, for gsub(), using perl = TRUE (with default fixed = FALSE, since fixed = TRUE is incompatible with perl = TRUE) may give you some substantial speedup.