Open mikejiang opened 5 years ago
I support eliminating the colnames
slot for flowSet
for the reasons you mentioned, even if it is a little bit of a hassle to make required changes to downstream methods.
Just a further thought, flowFrame
stores colnames
in both exprs
(as colnames attribute) and parameters
, and they are forced to be identical including the same order, which means they are redundant and could potentially cause discrepancy when not synced properly (e.g. https://github.com/RGLab/flowCore/blob/trunk/tests/testthat/test-colnames.R#L8-L10).
Right now it doesn't pose an immediate risk as long user code sticks to the official public APIs, which pretty much take care of syncing and guard against such discrepancy. But it is worth to think about the alternative cleaner data structure design by stripping colnames
attribute from exprs
to reduce the sources of colnames
info (there is yet another copy residing at $PnN
keyword ...)
Describe the bug It is somewhat related to the
colnames
swapping offlowFrame
described in #152 Basically[[<-
currently doesn't enforce the order ofcolnames
of replacementflowFrame
to be identical to thecolnames
slot offlowSet
. This leads to the inconsistentcolnames
amongframes
and violates the expectations offlowSet
class, in consequence will cause the erroneous operations (silently) onflowSet
, e.g. subsetting columns by integersfs[, j]
, or updating channelscolnames<-
This will have more complications for
ncdfFlowSet
where there is additionalorigColnames
to keep track of the data layout inh5
in which col order can't be changed(efficiently through partial IO) physically once written. To ReproduceExpected behavior have the more strict order check in
[[<-
to ensure all the colnames are identical. The better solution is to eliminatecolnames
slot fromflowSet
(andncdfFlowSet
) entirely because this redundant slot serves no good purpose but only creating potential data integrity troubles (typically confusing and hard to troubleshoot) down the road.