DillonHammill / CytoExploreR

Interactive Cytometry Data Analysis
61 stars 13 forks source link

integral of a histogram #32

Closed gombosim closed 3 years ago

gombosim commented 4 years ago

Hi, I'd like to get the integral of histograms. Is it possible in your package? I could not find the solution yet.

DillonHammill commented 4 years ago

@gombosim, can't believe I have never thought about adding this feature!

I could see if I could add it to cyto_stats_compute(). I will play around with it and get back to you.

Out of curiosity, what do you use this for (cell cycle analysis)?

gombosim commented 4 years ago

Thanks for the quick reaction and effort. Actually I use the whole package for analyzing data from high content microscopy. This includes not only cell cycle but level and localization of different proteins. So I could use it for different things.

DillonHammill commented 4 years ago

That is really cool! Never thought CytoExploreR would be used for this!

I am looking into the possibility of adding this feature, I can see some potential issues. This will require some work, I will let you know how I go.

gombosim commented 4 years ago

Oh, It was a long journey while I'm found the R and your package. I tried lot of software for flowcytometry analysis with tons of functions, but all of them lacked some of the features I wanted to use. And these were not so sophistacated things, just simple like changing the scale of the final histogram, or overlay presentation. One was always missing:) So let me know you that you did a great job with this CytoExploreR. It is very useful, indeed. I am waiting for the progress.

DillonHammill commented 4 years ago

@gombosim, just putting some notes here as the next version of CytoExploreR will ship with a revamped cyto_stats_compute() that will include support for any statistical function. Here are some of the key points that I hope to address:

1. cyto_stats_compute() should accept custom functions The stat argument in cyto_stats_compute() is currently limited to the names of supported statistics (e.g. mean, median, count etc.). In order to add more flexibility for users that may require custom statistical functions (as in your case @gombosim), the stat arguments should be upgraded to accept the name of custom functions as well. That way users can make use of the API to calculate any statistic that they want with all the formatting benefits of cyto_stats_compute. The below will be supported soon:

# Supported method called by name
cyto_stats_compute(gs,
                   alias = "T Cells",
                   stat = "mean")

# Custom method - use the quantile function from stats package
cyto_stats_compute(gs,
                   alias = "T Cells",
                   stat = "quantile",
                   probs = 0.95) # correctly pass additional arguments

2. Custom function must follow a consistent format In order to vectorise many of the statistical functions to improve processing speed, all custom statistical functions should accept a matrix of values and apply a function to each column (parameter) of the matrix. The output should be a vector that is the same length as the number of columns in the matrix (i.e. a single statistic per parameter). To add support for custom statistical functions within cyto_plot() we need to added a dispatching function that will extract the raw data from a flowFrame and compute the statistic. This will be handled internally using the new .cyto_stat().

3. cyto_stat_compute() should optionally return a tibble In order to reduce package dependencies, I will add a tibble flag to cyto_stats_compute() to optionally return a tibble. This means that all internal statistics will be either vectors, matrices or data.frames - meaning we can avoid all the tibble nuances and make the code a lot more readable. To accomplish this, I will add a tibble argument to cyto_stats_compute() which will be FALSE by default. If tibble = TRUE I will simply set class(df) <- c("tbl_df", "tbl", "data.frame") as described in the ROpenSci packaging guidelines (https://devguide.ropensci.org/building.html).

4. Format of exported statistics should be truly wide or long Some additional work is required to ensure that the data format is either completely wide or long - it is currently a mixture of both either way. Removal of tibbles above should make this a lot easier.

Now to address your request, I have added a new native statistical function to compute the area under curve for you. There are a variety of methods to accomplish this, but I have opted for an approach that uses splines. Basically, the density distribution will be created internally with (.cyto_density()) taking into account the density_smooth parameter, and then a spline will be fitted to the smoothed distribution. The area under the curve is computed as the integral of the spline over the range of values. It will be a couple of weeks before all these changes are finalized, but once complete, you will be able to do the following:

cyto_stats_compute(gs,
                   alias = "CD4 T Cells",
                   channels = "CD69",
                   stat = "auc") 

Sorry for taking some time on this, I figured if I am going to add support for this function I should at least try and make it easier to add support for other functions in the future. It is a fair bit of work but I think it will be it worth it.

I will let you know once I push the new and improved cyto_stats_compute() so that you can give it a try.

gombosim commented 4 years ago

Dear Dillon,

it seems very promising. I like your way to make the cyto_stat_compute function more flexible. It give the possibility to calculate "anything" with the data processed already in your package.

I am waiting for it,

best regards, Imre

Dillon Hammill notifications@github.com ezt írta (időpont: 2020. júl. 10., P, 3:42):

@gombosim https://github.com/gombosim, just putting some notes here as the next version of CytoExploreR will ship with a revamped cyto_stats_compute() that will include support for any statistical function. Here are some of the key points that I hope to address:

1. cyto_stats_compute() should accept custom functions The stat argument in cyto_stats_compute() is currently limited to the names of supported statistics (e.g. mean, median, count etc.). In order to add more flexibility for users that may require custom statistical functions (as in your case @gombosim https://github.com/gombosim), the stat arguments should be upgraded to accept the name of custom functions as well. That way users can make use of the API to calculate any statistic that they want with all the formatting benefits of cyto_stats_compute. The below will be supported soon:

Supported method called by name

cyto_stats_compute(gs, alias = "T Cells", stat = "mean")

Custom method - use the quantile function from stats package

cyto_stats_compute(gs, alias = "T Cells", stat = "quantile", probs = 0.95) # correctly pass additional arguments

2. Custom function must follow a consistent format In order to vectorise many of the statistical functions to improve processing speed, all custom statistical functions should accept a matrix of values and apply a function to each column (parameter) of the matrix. The output should be a vector that is the same length as the number of columns in the matrix (i.e. a single statistic per parameter). To add support for custom statistical functions within cyto_plot() we need to added a dispatching function that will extract the raw data from a flowFrame and compute the statistic. This will be handled internally using the new .cyto_stat().

3. cyto_stat_compute() should optionally return a tibble In order to reduce package dependencies, I will add a tibble flag to `cyto_stats_compute()to optionally return a tibble. This means that all internal statistics will be either vectors, matrices or data.frames - meaning we can avoid all the tibble nuances and make the code a lot more readable. To accomplish this, I will add atibble argument to cyto_stats_compute() which will be FALSE by default. If tibble = TRUE I will simply set class(df) <- c("tbl_df", "tbl", "data.frame") as described in the ROpenSci packaging guidelines ( https://devguide.ropensci.org/building.html).

4. Format of exported statistics should be truly wide or long Some additional work is required to ensure that the data format is either completely wide or long - it is currently a mixture of both either way. Removal of tibbles above should make this a lot easier.

Now to address your request, I have added a new native statistical function to compute the area under curve for you. There are a variety of methods to accomplish this, but I have opted for an approach that uses splines. Basically, the density distribution will be created internally with (.cyto_density()) taking into account the density_smooth() parameter, and then a spline will be fitted to the smoothed distribution. The area under the curve is computed as the integral of the spline over the range of values. It will be a couple of weeks before all these changes are finalized, but once complete, you will be able to do the following:

cyto_stats_compute(gs, alias = "T Cells", stat = "auc")

Sorry for taking some time on this, I figured if I am going to add support for this function I should at least try and make it easier to add support for other functions in the future. It is a fair bit of work but I think it will be it worth it.

I will let you know once I push the new and improved cyto_stats_compute() so that you can give it a try.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DillonHammill/CytoExploreR/issues/32#issuecomment-656434137, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO4JDWEHSUIKZGTF52IE65LR2ZWW5ANCNFSM4MKTSRQA .

DillonHammill commented 3 years ago

@gombosim, just letting you know that this is now implemented in CytoExploreR version 2.0.0 (coming soon).

Simply set stat = "auc" in cyto_stats_compute() or pass your own custom AUC function through the stat argument. Please let me know how you go when the new version is announced and reopen this issue if you experience any problems.

gombosim commented 3 years ago

Thank you so much for your effort.

On 2021. Oct 13., Wed at 1:39, Dillon Hammill @.***> wrote:

@gombosim https://github.com/gombosim, just letting you know that this is now implemented in CytoExploreR version 2.0.0 (coming soon).

Simply set stat = "auc" in cyto_stats_compute() or pass your own custom AUC function through the stat argument. Please let me know how you go when the new version is announced and reopen this issue if you experience any problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DillonHammill/CytoExploreR/issues/32#issuecomment-941746012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO4JDWEVRGUTYL6JQCKEZ63UGTBKNANCNFSM4MKTSRQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.