RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
29 stars 14 forks source link

"No samples in this workspace to parse!" error in flowjo_to_gatingset, apparently bc of divergent sample name and FCS file name #113

Open PedroMilanezAlmeida opened 4 years ago

PedroMilanezAlmeida commented 4 years ago

A workaround for https://github.com/RGLab/CytoML/issues/112 is to provide the directory where the FCS files are located and the FCS filename to path and subset, respectively. For example:

gs <- CytoML::flowjo_to_gatingset(ws,
                                  name = 1,
                                  path = dirname(sampleURI),
                                  subset = basename(sampleURI),
                                  extend_val = -Inf)

However, apparently if the sample name and the FCS filename don't match, Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : No samples in this workspace to parse! is thrown. A workaround would be:

gs <- CytoML::flowjo_to_gatingset(ws,
                                  name = 1,
                                  path = dirname(sampleURI),
                                  subset = fj_ws_get_samples(ws)$name[sampleID], # sampleID being the integer for the sample of interest
                                  extend_val = -Inf)

which works pretty well!

However, I am working with several samples that have the same sample name but different FCS files and different file names (these are all technical replicates of the same sample acquired on different days). Using the second code chunk above will, unfortunately, load all technical replicates with the same sample name, and, as mentioned in https://github.com/RGLab/CytoML/issues/112, I need to parse only one sample at a time.

Again, any help would be deeply appreciated.

PS: the help for subset indicates that FCS filenames can be used instead of sample names ("Or a character specifying the FCS filenames to be imported.")

PedroMilanezAlmeida commented 4 years ago

Just FYI, for my specific purpose, I found a workaround combining the second code chunk above and

if(length(gs) > 1) {
  gs <- gs[flowWorkspace::keyword(gs, "FILENAME")$FILENAME == sampleURI]
}

which subsets the GatingSet to keep only the sample of interest.

This is not ideal since I have to load more samples than needed, slowing things down.

Also, please let me know whether I should keep this and https://github.com/RGLab/CytoML/issues/112 open (are these the expected behavior for path and subset?).

gfinak commented 4 years ago

path is really meant to point to the directory where the FCS files reside. We search for files based on $FIL keywords if I recall correctly. subset is a bit of a legacy argument, it subsets the table of sampleid, samplename, groupname that's constructed from the XML, based on the sampleID or index. The subsetting API was implemented 10 years ago when flowWorkspace was still a pure R package. I think we probably need to take a second look at this interface and clean it up a bit. Can you provide more details about your use case?

mikejiang commented 4 years ago

subset is indeed the legacy argument, but besides the conventional numeric idx or FCS filenames based selection , it is still also able to take a R filter expression to sub-select samples based on keywords content recorded in xml (i.e. through fj_ws_get_keywords under the hood). see https://www.bioconductor.org/packages/devel/bioc/vignettes/CytoML/inst/doc/flowjo_to_gatingset.html#24_Import_a_subset for details If there is some keyword that can uniquely identify these replicates, then you can pass that as a filter expression to subset argument.

If they only differentiate by fcs filenames, then you will need to pre-load the target file into a cytoset and pass it to the parser, see this new feature introduced by #100 and illustrated here https://rpubs.com/rglab/622259

PedroMilanezAlmeida commented 4 years ago

Hey guys, thank you both for your replies.

@gfinak: pls see https://github.com/PedroMilanezAlmeida/ezDAFi for use case.

@mikejiang: yeah, that is what I ended up doing (pre-loading a cytoset with a single fcs).

I will leave this and https://github.com/RGLab/CytoML/issues/112 open since the help is not correct (path as data.frame gives error and subset as FCS filename loads more than one FCS file in some cases). Pls, feel free to close.