RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

fix Latin encoded keyword in GvHD #232

Closed mikejiang closed 2 years ago

mikejiang commented 2 years ago

see https://developer.r-project.org/Blog/public/2022/06/27/why-to-avoid-%5Cx-in-regular-expressions/index.html In the code base of flowCore, we do not use '\x' as the regular expression pattern, rather it is legacy data that contains such encoding ,.e.g.

library(flowCore) data("GvHD") k = keyword(GvHD[[1]])[["CREATOR"]] k [1] "CELLQuest\xaa" "3.3"
sub("^$"," ", k) Error in sub("^$", " ", k) : input string 1 is invalid

In my opinion this sub operation is legitimate, R shouldn't throw errors on the content that the regular expression is operated on, which isn't the issue of the code per se.

However I did correct this data content anyway simply because it floods the R console with lot of warnings upon loading the data in R operates in a different locale.

so here is what I did

GvHD <- fsApply(GvHD, function(fr){
  k = keyword(fr)[["CREATOR"]]
  keyword(fr)[["CREATOR"]] <- sub("\xaa","", k)
  fr
  })
save(GvHD, file = "../flowCore/data/GvHD.rda",compress = "bzip2")

this is what it looks now

> k = keyword(GvHD[[1]])[["CREATOR"]]
> k
[1] "CELLQuest" "3.3"