RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

cf_append_cols does not work for empty cytoframes #342

Closed DillonHammill closed 4 years ago

DillonHammill commented 4 years ago

@mikejiang & @jacobpwagner, I am able to add parameters to empty flowFrames through fr_append_cols as described below:

# Make flowFrame with 0 cells
fr <- flowFrame(matrix(1:4, nrow = 1, ncol = 4, dimnames = list(NULL, c("A","B","C","D"))))
fr <- fr[-1, ]

# New column
new_col <- matrix(, ncol = 1, nrow= 0, dimnames = list(NULL, "Test"))
fr_append_cols(fr, new_col) # works as expected

cf_append_cols doesn't do the same for cytoframes:

cf <- flowFrame_to_cytoframe(fr)
cf_append_cols(cf, new_col) # doesn't work
error: Mat::init(): requested size is not compatible with row vector layout
Error in append_cols(cf@pointer, colnames(cols), cols) : 
  Mat::init(): requested size is not compatible with row vector layout

I see that some changes were made recently to cf_append_cols. Please disregard this issue if it has already been fixed, I am using the latest version of flowWorkspace - 4.1.8 so I don't think that this has been addressed yet.

Thanks for your help! Dillon

jacobpwagner commented 4 years ago

Hey @DillonHammill, sorry for the delay. Thanks for catching this. The problem is arising from the armadillo matrix coercion, which won't handle a zero-extent dimension resulting in an empty matrix. I forgot about this edge case of an empty/flat dimension when I was hooking in cf_append_cols. I'll get right on it, but it might require some changes at the cytolib level similar to this.

jacobpwagner commented 4 years ago

Turns out it was even easier than that. The empty matrix is not a problem on its own. The error was actually arising from the usage of arma::min and arma::max to get the channel ranges for the new channels. They understandably need a non-empty vector to determine the min/max of. For fr_append_cols, those default to +/- infinity at the R level, so that is the approach I took in https://github.com/RGLab/cytolib/commit/a35f5f3bb841e84900a00f19a974592a302fdefa. But anyway, it should work now.

As a side note, we still need to add support for construction of cytoframe objects from matrices as for flowFrames. I know @gfinak asked about this as well. That is, you should not have to do this:

fr <- flowFrame(matrix(1:4, nrow = 1, ncol = 4, dimnames = list(NULL, c("A","B","C","D"))))
cf <- flowFrame_to_cytoframe(fr)

but rather should just be able to do this:

cf <- cytoframe(matrix(1:4, nrow = 1, ncol = 4, dimnames = list(NULL, c("A","B","C","D"))))

It's not yet implemented but it should be an easy addition.

DillonHammill commented 4 years ago

Thank @jacobpwagner, I appreciate your help. I was also wondering if a cytoset to cytoframe coercion method is in the pipeline? Please correct me if I am wrong but currently I have to coerce to a flowFrame.

jacobpwagner commented 4 years ago

By a cytoset->cytoframe coercion, do you mean essentially pooling events from multiple cytoframes by binding rows, like the flowSet->flowFrame coercion method in flowCore? This has been discussed and is in the pipeline but is not yet implemented.

Just out of curiosity and to guide development, what is your use case?

DillonHammill commented 4 years ago

Yes that is correct. I often perform these coercion operations within CytoExploreR so it would be great if it was supported.

For example, when performing dimension reduction, I coerce the flowSet/cytoset to a single flowFrame/cytoframe before extracting the matrix and passing it to the dimension reduction algorithm. The nice thing about coercing in this way is that it is easy to keep track of which events came from which cytoframe so that the data can be split afterwards. In fact I actually barcode each of the cytoframes using cyto_barcode() before merging - which is equivalent to adding the Original parameter with sample IDs when coercing a flowSet to a FlowFrame.

I also have support for merging flowSets/cytosets into a list of flowFrames/cytoframes based on experimental variables in cyto_merge_by(). This function is used throughout CytoExploreR but especially within cyto_plot() - this makes it easy to visualise pooled data based on experimental details.

I prefer to have the data stored within a cytoframe container instead of dealing with matrices separately, its make the code harder to follow and more difficult to debug. Not mention the duplication of the raw data as I would need to keep the cytoframes as well to extract additional information from them.

I guess in the meantime, I could do the following?

fr <- as(cs, "flowFrame")
cf <- flowFrame_to_cytoframe(fr)

I am just trying to make sure that I am always dealing with cytoframes/cytosets now and I think this is only place where flowFrames are still popping up. I am not removing support for flowFrames/flowSets but instead favouring cytosets/cytoframes.

jacobpwagner commented 4 years ago

Thanks. That helps a bunch and is along the lines of what I expected. It just helps to have some end use cases in the back of my mind as I work on it. I will probably add support for arbitrarily appending rows to cytoframes while I'm at it. That should be somewhat easier than adding support for adding columns as it won't require as much updating of metadata.

For now, yeah, you'll probably have to do as you described: cytoset->flowSet->flowFrame->cytoframe (where that first arrow is implicitly handled during the flowSet->flowFrame coercion by cytoset inheritance from flowSet). But hopefully that will not be necessary for long...

jacobpwagner commented 4 years ago

@DillonHammill , I've made https://github.com/RGLab/flowWorkspace/issues/348 to track feature additions for cytoframe. I'm closing this because I believe the original problem is solved. Feel free to re-open it if that's not the case.

DillonHammill commented 4 years ago

Thanks @jacobpwagner!