DillonHammill / CytoExploreR

Interactive Cytometry Data Analysis
60 stars 13 forks source link

Cannot read files including"[ ]" #133

Open tamaikeiichi opened 2 years ago

tamaikeiichi commented 2 years ago

Describe the bug Files of which name includes "[ ]" (e.g., [test].fcs ) can not be read by cytosetup(). Because In SONY MA900 cell sorter, some of the files automatically include "[ ]" (e.g., test[15 mL Tubes] Data Source - 1.fcs), I would be grateful if these files can be read without rename.

To Reproduce library(CytoExploreR) filename <- list.files("./")

load files

gs <- cyto_setup("./", select = filename, gatingTemplate = "Activation-gatingTemplate.csv") print(filename) cyto_names(gs)

output:

print(filename) [1] "[test].fcs" "test.fcs" "test.R"

cyto_names(gs) [1] "test.fcs"

Desktop (please complete the following information):

Additional context Thank you for your great packages!

DillonHammill commented 2 years ago

@tamaikeiichi, can't say I have ever tried to read files with square brackets in the name. Are you able to share a file so I can track down and fix the problem?

tamaikeiichi commented 2 years ago

Thank you for your quick reply. Attached files are the same records, but the file names are different (renamed). test.zip

DillonHammill commented 2 years ago

I can confirm that you are able to read in the files in the coming version of CytoExploreR:

cs <- cyto_load("Debug-Files")
cyto_names(cs)
[1] "[test].fcs" "test.fcs"  

I will switch to the version of CytoExploreR you are using and see if it is an easy fix.

DillonHammill commented 2 years ago

It seems like the problem comes from the file selection through select:

cs <- cyto_load("Debug-Files", select = c("test.fcs", "[test].fcs"))
cyto_names(cs)
[1] "test.fcs"  

I will take a closer look and report back soon.

tamaikeiichi commented 2 years ago

Thank you for your kind reply. I have also confirmed the problem of select.

temp <- cyto_setup()

cyto_names(temp) [1] "[test].fcs" "test.fcs"

I'd appreciate it if you fix this problem.

DillonHammill commented 2 years ago

Yeah the problem is due the fuzzy matching performed by select. It is trying to match [test].fcs in the filenames but this syntax actually has different meaning for regular expressions - so it returns incorrect matching.

The solution would be for me to set fixed = TRUE when I perform the fuzzy matching so that it tries to match [test].fcs exactly. The problem is that I switch to fixed matching the case becomes important, so if you accidently type [Test].fcs you won't get a match to [test].fcs.

Hmm... I might need to think about this a bit more but one alternative is I can add a fixed argument to cyto_load() and cyto_setup() so that you can have control over the matching.

Is there any reason why you can't do the following instead:

cs <- cyto_load("." select = "test"))

I would avoid passing special characters like [, {, ., :, !, | or * to the select and exclude arguments.

DillonHammill commented 2 years ago

OK so I think I have a solution. I think I will convert the matching criteria and file names to lower case during matching so that the match is not case sensitive but still an exact match.

This sort of matching would come in handy in a few places so I may need to write I separate function to handle this.

I will let you know once I have pushed the fixed to GitHub.

tamaikeiichi commented 2 years ago

I really appreciate your cooperation. I'm looking forward to it.

rwbaer commented 2 years ago

@DillonHammill I trust you to make a good solution, but Linux, MacOS, and R itself are case sensitive environments so ignoring case could have more extreme consequences as a work-around for handling special characters. Could the brackets just be delivered to select as properly escaped regular expressions?