Merck / pkglite

Compact Package Representations
https://merck.github.io/pkglite/
GNU General Public License v3.0
30 stars 4 forks source link

pattern_file_sanitize() - do we need the extra /? #40

Closed SHAESEN2 closed 1 year ago

SHAESEN2 commented 2 years ago

I was trying to remove some folders before packing but ran into issues. Then I realized you make the assumption that the files are already residing in a subfolder.

sanitize.file_collection <- function(x) {
  pkg_name <- fc$pkg_name
  df <- fc$df
  df <- df[!grepl(pkglite:::cat_patterns(pattern_file_sanitize()), df$"path_abs"), ]
  new_file_collection(pkg_name, df)
}

pattern_file_sanitize()
[1] "/\\.DS_Store$"     "/Thumbs\\.db$"     "/\\.git$"          "/\\.svn$"          "/\\.hg$"           "/\\.Rproj\\.user$" "/\\.Rhistory$"    
[8] "/\\.RData$"        "/\\.Ruserdata$"   

grepl("/\\.git", ".git/blah") # trying to remove the .git folder but this does not work
FALSE
grepl("/\\.git", "/.git/blah") # it failed above because . git was in the absolute folder
TRUE

It would be more intuitive (at least to me) to not make assumptions and change the pattern to below so the function finds all files:

[1] "\\.DS_Store$"     "Thumbs\\.db$"     "\\.git$"          "\\.svn$"          "\\.hg$"           "\\.Rproj\\.user$" "\\.Rhistory$"    
[8] "\\.RData$"        "\\.Ruserdata$"   
nanxstats commented 1 year ago

@SHAESEN2 Not sure if I can reproduce this... Could you please share a minimal reproducible example? Maybe there a better way to create the file collection in the first place.

Matching files with \\.RData$ and some others without the starting / in the sanitizer patterns will have some unintentional side effects.

nanxstats commented 1 year ago

Since there is no response on a minimal reproducible example, I'm closing this issue.

SHAESEN2 commented 1 year ago

The above is a minimal example.

grepl("/\.git", ".git/blah") # trying to remove the .git folder but this does not work FALSE grepl("/\.git", "/.git/blah") # it failed above because . git was in the absolute folder TRUE

Since we can't use the above function, we are forced to sanitize the collection ourselves with custom functions.