RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
45 stars 21 forks source link

fail to add $Pn* for cytoframe subset #315

Closed mikejiang closed 4 years ago

mikejiang commented 4 years ago

remove 7th channel by subsetting

> x = flowFrame_to_cytoframe(GvHD[[2]])[, -7]
> x
cytoframe object 'file45ff1ff250bf'
with 3405 cells and 7 observables:
     name              desc range minRange maxRange
$P1 FSC-H        FSC-Height  1022        0     1022
$P2 SSC-H        SSC-Height  1022        0     1022
$P3 FL1-H         CD15 FITC  9999        1     9999
$P4 FL2-H           CD45 PE  9999        1     9999
$P5 FL3-H        CD14 PerCP  9999        1     9999
$P6 FL2-A              <NA>  1022        0     1022
$P8  Time Time (51.20 sec.)  1022        0     1022
172 keywords are stored in the 'description' slot
> keyword(x)["$P7N"]
$<NA>
NULL

try to set P7N keyword,but it does nothing

> keyword(x)["$P7N"] <- "dd"
> keyword(x)["$P7N"]
$<NA>
NULL

setting/adding other keyword seems to be fine

> 
> keyword(x)["tt"]
$<NA>
NULL

> keyword(x)["tt"] <- "dd"
> keyword(x)["tt"]
$tt
[1] "dd"

Not sure if this behavior is appopriate, but it is causing the current write.FCS to fail since it needs to reset keywords before writing the subset to FCS

mikejiang commented 4 years ago

Turned out the $P7N does exist in the cytoframe, bypassing R API keyword, directly fetch data from c data structure

> cf_getKeyword(x@pointer, "$P7N")
[1] "dd"

It is keyword() method for cytoframe that calls flowCore::filter_keywords that only returns $PnX keywords that matches the existing parametes/channels.

For flowFrame, this filtering was done at [ (subsetting) stage, which essentially deletes the PnX keywords along with data columns (channels). But cytoframe is simply a view in c++, so we don't actually delete columns and keywords during subsetting. So we put this keyword filtering inline at keyword getter API, which is creating incompatible behavior with flowFrame. We are yet to decide what should be the right way to go about this. Maybe deleting the keywords at [ operation and disable filtering in keyword() is the easiest way to keep the current workflow of flowFrame work for cytoframe . However this will break assumption of CytoFrameView, which is supposed to keep the original data and ready for be restored once index is reset (maybe this is not a concern? Given we haven't got any use case that needs to return to the original state of data after the view is created)

mikejiang commented 4 years ago

We've decided to change the [ behavior of flowFrame to match cytoframe, i.e. do not remove PnX keyword during the cols subsetting. So that [ simply remove the data columns for both objects. This will simplify the logic and avoid the unexpected tampering the original cytoframe object, which is a reference that shares the same data with subsetted cytoframe. PnX keywords are typically not relevant to any type of data analysis until it is written to FCS files. So we will have dedicate logic to take care of it in write.FCS function.