RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

Retain Transformer Definitions - gs_save() and gs_load() #373

Open DillonHammill opened 2 years ago

DillonHammill commented 2 years ago

Hi @mikejiang,

I have noticed that the transformer definitions don't seem to be retained for GatingSets archived using gs_save() and reloaded with gs_load().

# Save transformed GatingSet
save_gs(gs, "GatingSet")
# Reload GatingSet
gs <- load_gs("GatingSet")
gh_get_transformations(gs[[1]])
> list()

Is there a way that we can keep the transformer definitions when saving the GatingSet?

DillonHammill commented 1 year ago

Hi @mikejiang, how difficult would it be retain the transformer definitions when saving and loading the GatingSet?

This is the last remaining critical issue that I have for CytoExploreR. It is very common for users to export and reload their GatingSets to continue their analyses - but the lack of maintenance of the transformers is causing a number of issues for users.

It seems that the issue originates from load_gs() having to create a new GatingSet on import and there isn't a way to transfer the transformers to the new GatingSet without having to transform (the already transformed) data. Would it be possible to attach the transformers without transforming the data using an internal API? I can understand why you wouldn't want users fiddling with this, but it is very inefficient to inverse transformations just to apply them again so that the transformers can be attached.

I have considered manually exporting linear data and associated transformers and then re-applying them upon import - but this seems very inefficient particularly when analysing large datasets (thousands of samples). It makes more sense to export and load the transformed data as is, but we need to preserve the transformers so that the data can still be returned to the linear scale when required.

Do you have any suggestions on a viable workaround? I am out of ideas and considering writing the transformers to a keyword - I know this is a terrible idea and I don't want to have to do it that way but I can't see another option.

Thanks for your help.

DillonHammill commented 1 year ago

Hi @mikejiang, I have done some digging around to track down where things are going wrong. Here is an example that will hopefully illustrate the problem. First we create a GatingSet and apply transformations to that GatingSet, we can see that the transformations are indeed attached and if we try to save and reload the GatingSet the transformations are retained.

library(CytoExploreR)
library(CytoExploreRData)

# Create and transform GatingSet
gs <- GatingSet(Activation)
gs <- cyto_transform(gs, channels = "PE-A",  plot = FALSE)

# Extract transformers from GatingSet
trans <- gh_get_transformations(
  gs[[1]], 
  only.function = FALSE
)
transformerList(
  names(trans),
  trans
)

> $`PE-A`
Transformer: flowJo_logicle [-Inf, Inf]

attr(,"class")
[1] "transformerList" "list" 

# Export and reload GatingSet
save_gs(gs, "Primary_GatingSet")
gs_load <- load_gs("Primary_GatingSet")

# Extract transformers from reloaded GatingSet
trans_new <- gh_get_transformations(
  gs_load[[1]], 
  only.function = FALSE
)
transformerList(
  names(trans_new),
  trans_new
)

> $`PE-A`
Transformer: flowJo_logicle [-Inf, Inf]

attr(,"class")
[1] "transformerList" "list" 

If I extract a cytoset from the transformed GatingSet, reverse the transformations, add the cytoset to a new GatingSet and re-apply the transformations, the transformers are correctly attached to the new GatingSet. However, if I save and reload this new GatingSet, the transformers are no longer attached.

# Extract cytoset from transformed GatingSet - could be any population
cs <- realize_view(
  gs_cyto_data(gs)
)

# Apply inverse data transformations to get linear data
cs <- cyto_transform(
  cs,
  trans = trans,
  inverse = TRUE,
  plot = FALSE
)

# Create new linear GatingSet
gs_new <- GatingSet(cs)

# Apply transformations as before
gs_new <- cyto_transform(
  gs_new,
  trans = trans,
  plot = FALSE
)

# Extract transformers from new GatingSet
trans_new <- gh_get_transformations(
  gs_new[[1]], 
  only.function = FALSE
)
transformerList(
  names(trans_new),
  trans_new
)

> $`PE-A`
Transformer: flowJo_logicle [-Inf, Inf]

attr(,"class")
[1] "transformerList" "list" 

# Export and reload new transformed GatingSet
save_gs(gs_new, "Secondary_GatingSet")
gs_load <- load_gs("Secondary_GatingSet")

# Extract transformers from reloaded GatingSet
gh_get_transformations(
  gs_load[[1]], 
  only.function = FALSE
)
> list()

As you can see above, prior to exporting the GatingSet, the transformers are attached to the GatingSet, however after reloading the transformer definitions have been lost.

Do you have any ideas why this is happening? It seems strange that the transformers are there just prior to saving but are lost after reloading.

Thanks for your help!

Dillon