DillonHammill / CytoExploreR

Interactive Cytometry Data Analysis
61 stars 13 forks source link

FEATURE DEMO: Parse file names to experiment variables #43

Closed DillonHammill closed 4 years ago

DillonHammill commented 4 years ago

In order to reward users that appropriately name their FCS files and to promote good file naming practices, I have added a new function cyto_names_parse(). This function will split each file name by a delimiter and add each chunk as a separate experiment variable in cyto_details(). This will save users a lot of time manually entering these details. This is how it works:

First, you need to decide on a consistent naming format for your FCS files. As an example, we could follow a date-experiment-treatment naming format for each of our FCS files. This would generate file names that look something like this: 160520-Exp1-StimA.fcs, 160520-Exp2-Stim-B.fcs and so on. As you can see, naming files in this way retains a lot of additional experimental information that we may want to use in our analyses. This would be easy to manually enter for a handful of samples, but this becomes very tedious when analyzing many samples.

This is where cyto_names_parse() can help! It will split the file names into the experiment variables date, experiment and treatment and automatically update the cyto_details() with these new variables for you! cyto_names_parse() will work for any flowSet or GatingSet object.

cyto_names_parse()

# gs is a GatingSet
gs <- cyto_names_parse(gs,
                       split = "-",
                       vars = c("date", "experiment", "treatment"))

# Updated cyto_details
cyto_details(gs)
                        date         experiment          treatment
160520-Exp1-StimA.fcs   160520       Exp1                StimA
160520-Exp1-StimB.fcs   160520       Exp1                StimB
160520-Exp1-StimC.fcs   160520       Exp1                StimC
160520-Exp2-StimA.fcs   160520       Exp2                StimA
160520-Exp2-StimB.fcs   160520       Exp2                StimB
160520-Exp2-StimC.fcs   160520       Exp2                StimC
160520-Exp3-StimA.fcs   160520       Exp3                StimA
160520-Exp3-StimB.fcs   160520       Exp3                StimB
160520-Exp3-StimC.fcs   160520       Exp3                StimC

I have also added the option to parse_names in cyto_setup() should you wish to perform this operation when the files are initially loaded. Simply set parse_names to TRUE or supply the delimiter that you want to use (the default is "_") which may not be applicable for your files. This will automatically assign names for the variables which can be manually edited in the experiment details editor.

# Parse names in cyto_setup()
gs <- cyto_setup("Samples",
                 details = TRUE,
                 parse_names = "-")

That saved us a lot of time!

northNomad commented 4 years ago

I’m going to love this so much! Thank you so much DH!

rwbaer commented 4 years ago

Great feature!

northNomad commented 4 years ago

@DillonHammill I was under the impression that this feature has been released. Cannot seem to find the function after updating.

Am I wrong or this is for the next update?

Best & Thanks so much, NN

DillonHammill commented 4 years ago

@northNomad, it is definitely there if you pull down the latest master branch. Note that the version is still at 1.0.7 until I finish of a couple of additional updates. Try pulling it down again and checking that cyto_names_parse shows up in the list of functions when the package is installing.

devtools::install_github("DillonHammill/CytoExploreR")
DillonHammill commented 4 years ago

@northNomad, cyto_names_parse() automatically updates the cyto_details() and returns the data. See code in first comment, you should assign the output to gs not cyto_details(gs).

DillonHammill commented 4 years ago

Just adding a comment to highlight that I also added an exclude argument to cyto_names_parse() that can be used to exclude variables in the name from inclusion in cyto_details(). See example below when we don't want to include the date component in the cyto_details():

# gs is a GatingSet
gs <- cyto_names_parse(gs,
                       split = "-",
                       vars = c("experiment", "treatment"),
                       exclude = 1) # exclude first chunk from cyto_details()

# Updated cyto_details
cyto_details(gs)
                             experiment          treatment
160520-Exp1-StimA.fcs        Exp1                StimA
160520-Exp1-StimB.fcs        Exp1                StimB
160520-Exp1-StimC.fcs        Exp1                StimC
160520-Exp2-StimA.fcs        Exp2                StimA
160520-Exp2-StimB.fcs        Exp2                StimB
160520-Exp2-StimC.fcs        Exp2                StimC
160520-Exp3-StimA.fcs        Exp3                StimA
160520-Exp3-StimB.fcs        Exp3                StimB
160520-Exp3-StimC.fcs        Exp3                StimC