RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

Allow to transform a channel to a new one. #367

Open OuNao opened 2 years ago

OuNao commented 2 years ago

I need a better way to transform a channel creating a new channel.

This can be accomplished in data.frame easily with: df2 <- transform(df1, new = fun(old))

Flowframe (and cytoframe) support using a transformList object on transform method: ff2<-transform(ff, flowCore::transformList(from, fun))

The usage of transformList constructor is: transformList(from, tfun, to=from, transformationId ="defaultTransformation") So I thought what I needed could be by done setting the "to" parameter: ff2<-transform(ff, flowCore::transformList(from, fun, to)) But it generates a error: Error in tList %on% `_data` : to is not a variable in the flowFrame

Actually, I need to extract the exprs, transform and then append as a new colum to the cytoframe (cf_append_cols). Transform a channel from a fcs with 5 million events is fast but append to cytoframe is soooo slow...

That can be implemented?

Thanks.

mikejiang commented 2 years ago
 library(flowCore)
> library(flowWorkspace)
> data("GvHD")
> fr = GvHD[[1]]     
> transform(fr, new = log(`FSC-H`))
flowFrame object 's5a01'
with 3420 cells and 9 observables:
           name                   desc     range  minRange    maxRange
$P1       FSC-H             FSC-Height      1024         0  1023.00000
$P2       SSC-H             SSC-Height      1024         0  1023.00000
$P3       FL1-H              CD15 FITC      1024         1 10000.00000
$P4       FL2-H                CD45 PE      1024         1 10000.00000
$P5       FL3-H             CD14 PerCP      1024         1 10000.00000
$P6       FL2-A                     NA      1024         0  1023.00000
$P7       FL4-H               CD33 APC      1024         1 10000.00000
$P8        Time      Time (51.20 sec.)      1024         0  1023.00000
$P9         new derived from transfo..      1024      -Inf     6.93049
169 keywords are stored in the 'description' slot

but unfortunately this inline appending column only works for flowframe (which is in-memory object) not for cytoframe, probably is not what you are asking for.

OuNao commented 2 years ago

Yes, that is not what I need.

Coerce a cytoframe to flowframe, transform and than coerce again to cytoframe maybe is not so problematic... But I need a way to do this programatically. NSE is not easy to work programatically:

ff2<-transform(ff, someNew=myfun(someOld)

Simply doesn't work.

someNew, myfunc and someOld must be set at runtime programatically.

someNew<-"FITC-A-BiExpTransformed"
someOld<-"FITC-A"
ff2<-transform(ff, someNew=myfun(as.symbol(someOld))

I need to use do.call and the parameters as list to correctly set "someNew" parameter. myfun can´t be found (probably a scoping problem). If I use a "common" function as log "someOld" must be a character string and not a variable. If I use a variable I get a not found error (again scope problem?)...

SamGG commented 2 years ago

@mikejiang thanks for your minimal example @OuNao what is the code of the function you want to apply?

OuNao commented 2 years ago

@SamGG I use several custom transformation funcions as logicle, log, asinh (logicleGml2_trans, logtGml2_trans, asinhtGml2_trans).

SamGG commented 2 years ago

I don't understand what you mean by "custom" as the cited function are standard.

I tried to create a new variable using transform() and transformList() by giving to the "to" argument a name not present in the current list of parameters. I didn't succeed.

I ended up with an uggly code that Mike will supersede.

myfun = function(x, coef = 10) asinh(x/coef)

(fr_with_new = transform(fr, "new" = 0))

(fr_with_new_tranformed = do.call("transform", list(
  fr_with_new, transformList(c("FL1-H", "FL2-H"), list(log, myfun), to = c("FL1-H", "new")))))

# alt.
(fr_with_new_tranformed = do.call("transform", list(
  fr_with_new, transformList(c("FL1-H", "FL2-H"), list(log, function(x) myfun(x, 5)), to = c("FL1-H", "new")))))
OuNao commented 2 years ago

@SamGG Sorry. I use some "custom" linear e log transformations (the standard ones does not work for my needs).

OuNao commented 2 years ago

Hi,

I tried some different aproachs to accelerate this transformation process (to a new channel):

  1. Use flowFrames instead of cytoframe: not better as flowFrame must be converted to cytoframe to use in GatingSet (the code save the flowFrame to disk and use load_cytoframe_from_fcs). So slow...
  2. Change the cytoframe backend to "mem": I get 1 second faster (transform a fcs with 11 channels and 5 million events) from 14.6 seconds (h5 backend) to 13.6 seconds.

Why cf_append_cols is so slow even with the data on memory?

Thanks.