gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

Difference between stri_trans_char and chartr #336

Closed eyherabh closed 5 years ago

eyherabh commented 5 years ago

stri_trans_char is said to be "a stringi-flavoured chartr() equivalent." but it is not. Here I tried to document that difference, which turned out particularly important when trying to manipulate genetic sequences. Thanks for this package.

gagolews commented 5 years ago

Thanks, I'll merge, but you should edit the man spec in R/trans_other.R file's comments, and then use roxygen to generate the .Rd

as stated in the .Rd header:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/trans_other.R
eyherabh commented 5 years ago

Sorry for the delay, I'll do that as soon as I can. Are you sure that stri_trans_char is vectorized over each codepoint in pattern and replacement? I have not gone through the source-code but the behaviour indicates otherwise, namely 'stri_trans_char("AB","AB","BC")' and 'stri_trans_char("AB","A123456789123456789B","B123456789123456789C")' both turn AB into CC, as if it operates sequentially on pattern and replacement (the second example was intended to test if the seemingly missing vectorized behaviour was due to pattern and replacement being too short). I'll try to update the man pages and let you know so you can decide what's best. Thanks again for the work.

gagolews commented 5 years ago

You know what, let's not do that.

I gave it a thought and this is clearly a mis-behavior of this stringi function. I opened #343 for this, I'll file a fix soon.