Open DillonHammill opened 4 years ago
Aaaah, everything makes sense now...
Is there a reason why subsetting a cytoset with negative integers does not work?
> # cs is a cytoset with 6 samples
> cs[-6]
Error in subset_cytoset(x@pointer, i, j) : index error
> traceback()
4: stop(list(message = "index error", call = subset_cytoset(x@pointer,
i, j), cppstack = list(file = "", line = -1L, stack = "C++ stack not available on this system")))
3: subset_cytoset(x@pointer, i, j)
2: cs[-6]
1: cs[-6]
Dillon
@DillonHammill , not really a good reason for it not working other than it not being written. This should be an easy addition and I'll get on it.
flowWorkspace-Introduction
vignette (under "The cytoframe and cytoset classes") with some examples. But I can address your questions briefly here:read.ncdfFlowset
will still be available, but most of our code will be switching towards using the cytoframe
and cytoset
classes.cytoframe
and cytoset
, instead of flowFrame
and flowSet
, will take some adjustment. If you wish to avoid any changes to the original cytoframe
objects when performing operations, you can always make a deep copy before (see that section in the vignette).
Although it should work as a short-term solution, I would recommend avoiding adding cytoframe_to_flowFrame
and cytoset_to_flowSet
everywhere and instead refactoring a bit to only perform full copies where it is truly necessary. Generally avoiding costly and unnecessary copies by default makes for more efficient operations, which was part of the rationale for this change.[[
extraction from a cytoset
is currently still a flowFrame
, but it first goes through a cytoframe
and then cytoFrame_to_flowFrame
. This is mostly due to compatibility with the current state of openCyto
, but the default return type will probably switch to cytoframe
as soon as we can implement necessary downstream changes. If you want to get ahead of that as you rework your code, sticking with the cytoframe
is an option with returnType
, e.g. cs[[1, returnType = "cytoframe"]]
or with get_cytoframe_from_cs()
.realize_view(cf/cs)
(we could maybe make a deep_copy
alias). For a shallow copy, it's just shallow_copy(cf/cs)
.cytoframe
/cytoset
versus ncdfFlowset
, one of the important aspects is building out a consistent abstraction layer that can handle multiple backend storage modes so it will be extensible beyond HDF5.Happy to answer more questions and apologies for the headaches these changes may cause for CytoExploreR
, but it will be better in the long run.
Thanks @jacobpwagner.
That's OK. I will work through making the necessary changes so that CytoExploreR is on par the RGLab suite and then I will re-submit to ROpenSci. The main difference for me is that the underlying data is changed whenever manipulations are made.
Just to clarify, is it better to make a deep or shallow copy of the data within functions? For example, in cases where I am manipulating data for visualisations but not returning the data, is it better to make a deep or shallow copy of the data? Presumably, in cases where the data is returned a deep copy is preferred?
One more question about deep/shallow copies. See below:
# cs is a cytoset
# make shallow copy
cs_shallow_copy <- shallow_copy(cs)
# assign shallow copy to cs
cs <- cs_shallow_copy
Presumably, assignment of shallow copy to cs
does not replace the deep copy with the shallow one, but instead replaces the definition of cs with the shallow copy?
I will have a go at trying the negative subsetting later today.
Thanks again!
Dillon
To avoid the confusion, shallow_copy
is removed.
#reference
cs1 <- cs
# views
cs2 <- cs[i, j] #subview
cs3 <- cs[, ] #full view
#copy
cs4 <- realize_view(cs)
See the updated doc https://github.com/RGLab/flowWorkspace/blob/trunk/vignettes/flowWorkspace-Introduction.Rmd#L404-L425
Thanks @mikejiang, so any changes to the view will alter the original cytoset, unless a deep copy is made with realize_view()
.
Sorry to bug you again @jacobpwagner. Is it possible to add negative channel subsetting for cytosets as well?
# Remove first channel
> cs[, -1]
> Error in copy_view_cytoset(x@pointer) : external pointer is not valid
Sure. Sorry, I sort of meant to add that as well in the earlier commit and then didn't get around to it. I'll take care of it.
Just added negative column subsetting support in fd10e742addfe73c92861bccd9dd0b6394efec9b. And no worries about "bugging". It's all really helpful, actually.
Thanks @jacobpwagner, I will let you know if I encounter any other issues.
Is there any way to ship cytoset
or GatingSet
objects in a data package? My understanding is that the cytoset
needs to be constructed locally so that it can find the location on disk. I could never get this to work, just wondering if it is possible?
Only if you ship the files on disk.
Thanks @gfinak, how would I go about doing this? Using DataPackageR I currently construct a flowSet object which is shipped in the package.
Probably in inst/extdata
and then use system.file()
to locate the data in the installed package and load it.
This is what I am currently doing:
files <- list.files(path = DataPackageR::project_extdata_path('Activation'),
full.names = TRUE)
Activation <- read.flowSet(files = files)
So replace list.files
with system.file
call and switch to load_cytoset_from_fcs
?
I think, yes.
Thanks @gfinak I will give it a try.
The cytoset builds fine in DataPackageR but does not open when called after loading the data package...
files <- list.files(system.file("extdata/Activation", package = "CytoExploreRData"), full.names = TRUE)
Activation <- load_cytoset_from_fcs(files)
library(CytoExploreRData)
Activation
Error in get_pheno_data(object@pointer) : external pointer is not valid
Sorry closed by accident. I do have some additional pData()
assignments, I will try removing these to see if that works...
Yeah, see if that fixes it. Just those two lines above complete successfully in my testing.
Still no luck with just these lines...
files <- list.files(system.file("extdata/Activation", package = "CytoExploreRData"),
full.names = TRUE)
Activation <- load_cytoset_from_fcs(files)
library(CytoExploreRData)
Activation
Error in get_pheno_data(object@pointer) : external pointer is not valid
I have pushed the changes to the devel
branch of CytoExploreRData
if you would like to try it. The Activation preprocessing script can be located in data-raw as Activation.Rmd.
Weird. Still cannot reproduce it:
> library(CytoExploreR)
Loading required package: flowCore
Loading required package: flowWorkspace
As part of improvements to flowWorkspace, some behavior of
GatingSet objects has changed. For details, please read the section
titled "The cytoframe and cytoset classes" in the package vignette:
vignette("flowWorkspace-Introduction", "flowWorkspace")
Loading required package: openCyto
Loading required package: ncdfFlow
Loading required package: RcppArmadillo
Loading required package: BH
> library(CytoExploreRData)
> files <- list.files(system.file("extdata/Activation", package = "CytoExploreRData"),
+ full.names = TRUE)
> Activation <- load_cytoset_from_fcs(files)
> Activation
A cytoset with 33 samples.
column names:
FSC-A, FSC-H, FSC-W, SSC-A, SSC-H, SSC-W, Alexa Fluor 488-A, PE-A, PE-Texas Red-A, 7-AAD-A, PE-Cy7-A, Alexa Fluor 405-A, Alexa Fluor 430-A, Qdot 605-A, Alexa Fluor 647-A, Alexa Fluor 700-A, APC-Cy7-A, Time
The library(CytoExploreR)
and library(CytoExploreRData)
lines aren't necessary, but I wanted to make sure it wasn't some sort of weird namespace issue.
Ah, I see. I was going with your first lines. You just mean this:
> library(CytoExploreRData)
> Activation
Loading required package: flowWorkspace
As part of improvements to flowWorkspace, some behavior of
GatingSet objects has changed. For details, please read the section
titled "The cytoframe and cytoset classes" in the package vignette:
vignette("flowWorkspace-Introduction", "flowWorkspace")
Error in get_pheno_data(object@pointer) : external pointer is not valid
That's correct, the cytoset should be loaded with just the above calls. It seems like the dataset should be constructed when the package is loaded. I think it is trying to load a pre-built cytoset and that is why it is having problems.
save_cytoset
and load_cytoset
are added.
see https://github.com/RGLab/flowWorkspace/blob/trunk/vignettes/flowWorkspace-Introduction.Rmd#L467-L473
Thanks @mikejiang.
I thought I might be able to do the following in data-raw/Activation.Rmd. First create the cytoset
, save it using save_cytoset
. Comment out the code to construct the cytoset
and add a line to load the stored cytoset
to object called Activation
. When building the dataset this will get saved to Activation.rda
which will be loaded with the package. This approach still leads to the same error...
I would like to export the cytoset object so that no paths are required to construct the cytoset when loading the data package.
Just a comment on this that I will try when I get the chance. I think the issue is stemming from a lack of access privileges to files when the R package library is located in a restricted directory (such as Program Files).
@mikejiang, I was hoping that you could clarify whether subsetted cytoframes are duplicated on disk or whether they point to the original copy?
In the example below, do cf
and cf_sample
point to the same data on disk or are there separate copies made for the cf
and cf_sample
? My understanding is that realize_view()
is required to create the copy (right?), otherwise they both point to the same underlying data?
library(CytoExploreRData)
cs <- flowSet_to_cytoset(Activation)
cf <- cs[[1]]
cf_sample <- Subset(cf, sampleFilter(size = 5000))
I just want to make sure that I am not unnecessarily duplicating data.
Thanks for your help!
yes, it points to the same on-disk data unless realize it
> class(cf)
[1] "cytoframe"
attr(,"package")
[1] "flowWorkspace"
> nrow(cf)
[1] 119531
> cf_get_uri(cf)
[1] "/tmp/48aee2e9-64c1-4405-bb4e-cec0a2d3354a.h5"
>
> cf_sample <- Subset(cf, sampleFilter(size = 5000))
> nrow(cf_sample)
[1] 5000
> cf_get_uri(cf_sample)
[1] "/tmp/48aee2e9-64c1-4405-bb4e-cec0a2d3354a.h5"
>
> cf_cp <- realize_view(cf_sample)
> cf_get_uri(cf_cp)
[1] "/tmp/RtmpiZafH8/file12029c571b520e.h5"
Thanks @mikejiang, and wrapping the subsetted cytoframe in a cytoset also points to the same data. That's really cool!
cs_new <- cytoset(
structure(
lapply(seq_along(cs), function(z){
Subset(cs[[z]], sampleFilter(size = 5000))
}),
names = sampleNames(cs)
)
)
cf_get_uri(cs[[1]])
cf_get_uri(cs_new[[1]]) # same as above
I think this error I'm getting might be related to this thread. I'm working through the "Compensation of Fluorescent Spillover" vignette in CytoExploreR but I'm not getting very far. I can't solve it because I am a beginner with R and trying to learn CytoExploreR:
[Workspace loaded from ~/R folder/CytoExploreR-Demo/.RData]
Loading required package: flowWorkspace As part of improvements to flowWorkspace, some behavior of GatingSet objects has changed. For details, please read the section titled "The cytoframe and cytoset classes" in the package vignette:
vignette("flowWorkspace-Introduction", "flowWorkspace") Error in get_cytoset(x@pointer) : external pointer is not valid
Any help would be appreciated. Thank you.
gaining set has to be loaded from disk via load_gs()
function in order to de-serialize C++ data structure, merely restore R object from .RData won't be sufficient.
Thank you very much for the help. I will give it a try.
Hi @mikejiang and @jacobpwagner,
I just had a couple of questions about the new cytoset reference class.
Does this mean that the plan is do away with
read.ncdfFlowSet
in favour ofload_cytoset_from_fcs
?I have noticed that performing operations such as transformations without assignment still modifies the supplied cytoset (which makes sense). For example:
How do I interact with data without making these sorts of changes to the underlying data? Do I have to coerce the
cytoset
to aflowSet
usingcytoset_to_flowSet
for all internal manipulations? I have noticed thatcytoset_to_flowSet
accepts flowSet objects and leaves them unaltered.Should I include this conversion in all of my existing flowSet methods to ensure that I am not altering the underlying data? Presumably I would have to do the same for my flowFrame methods as well using
cytoframe_to_flowFrame
.As far as I can tell
cytoframe
objects are only ever going to appear (at the moment) when reading in a single file. As subsetting acytoset
still returns aflowFrame
object. Is this correct? Presumablycytoframes
will replaceflowFrames
eventually?Is there a clone method available for
cytoset
and/orcytoframe
objects?Just out of curiosity, what are the benefits of
cytosets
versusflowSets
orncdfFlowSets
?Sorry for all the questions.
Much appreciated,
Dillon