RGLab / MAST

Tools and methods for analysis of single cell assay data in R
226 stars 57 forks source link

Construct a FluidigmAssay from raw NanoString data #5

Closed gfinak closed 11 years ago

gfinak commented 11 years ago

I'll start by constructing a FluidigmAssay from NanoString data, then work on specializing a NanoStringAssay class that captures anything which is missing.

Okay, I'm able to construct a FluidigmAssay object without much trouble.

amcdavid commented 11 years ago

Ok, sounds good. So NanoStringAssay will subclass FluidigmAssay? If so, (and I think it should) we should store the raw data in a Mapping field other than "measurement" (say, raw), then store thresholded data in "measurement". Or perhaps we can check to see if thresholding has occurred, and if so return the thresholded data. If not, then return raw.

This way the FluidigmAssay methods that expect zero-inflated data will work without modification.

amcdavid commented 11 years ago

Also, it appeared before that log (or log +1) transforming the data made it approximately normal, should we do this upon NanostringAssay construction, or after thresholding? My vote is after thresholding since that's sort of a post processing step. I guess we could break this functionality off into other method,as well, but we'd need to figure where to store the log-transformed data (another field in Mapping?)

gfinak commented 11 years ago

Andrew, should we do the thresholding of the Nanostring object upon construction?

gfinak commented 11 years ago

With regards to your other comment on where to store the transformed data, I think it should not be stored. log transforming is cheap anyway. We're storing the transformed and thresholded data anyway.

gfinak commented 11 years ago

Since we have some post-processing to do to the data frame during construction, I'm thinking of adding arguments that let the user specify a post-processing function. It can act directly on the columns in the data frame, add new columns, etc. Then the object is then constructed, transformed, and so forth with all the relevant metadata.

amcdavid commented 11 years ago

The columns of which data frame? The one that's kept internally in the env object, or something that's passed to the constructor? (As it stands, one probably would want to pass the constructor a list of objects outputed from your rcc.reader). If the former, it seems like the issue isn't so much the extra columns, but how they are managed with the mapping. On Feb 22, 2013 5:26 PM, "Greg Finak" notifications@github.com wrote:

Since we have some post-processing to do to the data frame during construction, I'm thinking of adding arguments that let the user specify a post-processing function. It can act directly on the columns in the data frame, add new columns, etc. Then the object is constructed, transformed, and so forth.

— Reply to this email directly or view it on GitHubhttps://github.com/RGLab/SingleCellAssay/issues/5#issuecomment-13982499.

raphg commented 11 years ago

Agree we should store the raw data, processing such as log transformation and thresholding could be done afterwards.

Greg Finak mailto:notifications@github.com February 22, 2013 5:21 PM

With regards to your other comment on where to store the transformed data, I think it should not be stored. log transforming is cheap anyway. We're storing the transformed and thresholded data anyway.

— Reply to this email directly or view it on GitHub https://github.com/RGLab/SingleCellAssay/issues/5#issuecomment-13982422.

Greg Finak mailto:notifications@github.com February 22, 2013 1:52 PM

I'll start by constructing a FluidigmAssay from NanoString data, then work on specializing a NanoStringAssay class that captures anything which is missing.

— Reply to this email directly or view it on GitHub https://github.com/RGLab/SingleCellAssay/issues/5.

Raphael Gottardo, Associate Member www: http://www.rglab.org http://www.rglab.org/ Phone: 206-667-4076 Fred Hutchinson Cancer Research Center Vaccine and Infectious Disease Division Public Health Sciences Division