Exported channel values do not agree with what was shown in FlowJo Biex?

ImmuneDynamics / Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.

https://immunedynamics.github.io/spectre/

MIT License

56 stars 21 forks source link

Exported channel values do not agree with what was shown in FlowJo Biex? #163

Closed denvercal1234GitHub closed 3 months ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thanks again for the tool and your help so far!

I hope to get clarification on why the exported channel values do not agree with the numerical values I saw on FlowJo plots? Does that mean, for example, in FlowJo, I saw there is a good population that follows the diagonal aline (double-positive), but this population does not look as prominent when I plotted their channel values in R.

I thought if we exported the FCS from FlowJo as channel values, then the transformation done by FlowJo (in my case, bi-ex) would be preserved in the exported values and that is why in Spectre workflow we will not need to perform any transformation.

As viewed in FlowJo (values go up to 10^5 on both axes):

Reading in the exported channel values:

channel_data.list <- Spectre::read.files(file.loc = "....../FlowJoBiExED_ChannelValues", file.type = ".csv", do.embed.file.names = TRUE)

Exported channel values have values below 10^3 (some rows are shown as example):

Plotted in R of the same parameters look different than in FlowJo:

ghar1821 commented 1 year ago

I believe the channel values serve as an alternative to the bi-ex transformation, with both methods aiming to preserve the distribution of marker expression. Given their distinct way of transforming data, it's reasonable not to expect identical numerical values from both methods. After all, if they yielded the same transformed values, there wouldn't be a need for two separate methods, would there?

When you export the channel values from FlowJo, it performs a linear binning transformation on the data, which negates the need for further bi-ex, logicle, or arc-sinh transformations in downstream analysis.

If you find that the channel values don't quite meet your analysis needs, you could consider exporting as CSV scale values. From there, you can apply either a logicle or arc-sinh transformation using the do.logicle or do.asinh functions. Unfortunately, we don't currently offer functions to do bi-ex transformation.

I'd also recommend reading the following two guides on data transformation. They might provide some additional insight:

tomashhurst commented 1 year ago

@denvercal1234GitHub is there any chance the X and Y are flipped between the FlowJo and R examples? There is a string of cells on the bottom right that looks similar to those on the top left in the FlowJo example.

denvercal1234GitHub commented 1 year ago

Hi @tomashhurst and @ghar1821 -- Thank you for the input. The reason why I wanted to use channel values were to eliminate the need to decide on the cofactor to transform in R.

Below is another example of a FCS file after I transformed it in FlowJo with bi-exponential transformation. I then exported it as channel values (.csv) to then import into R.

Once the channel values are imported into R, the values are now in the hundreads and none of the cells is at 0 any more (even though from visually looking at the FlowJo plots, it looks like some cells should be at 0?). As a result, when I clustered and then plots the expression levels across clusters, the baseline is not 0, but all are around ~200 (as mentioned in https://github.com/HelenaLC/CATALYST/issues/358).

Is it normal? It is a bit strange to have most of the cells having baseline of expression at hundreds.. or is it just how bi-exponential transformed data are? I want to make sure that the exported channel values are compatible with FlowSOM clustering without doing any additional steps I did not know.

Other markers do have some cells at 0, however:

Thank you for your help!

denvercal1234GitHub commented 1 year ago

Also @ghar1821 @tomashhurst --- Should we even use channel values exported from FlowJo (after visually transforming the data using biexponential in FlowJo) for clustering purposes? Because in other posts, it was mentioned transformed and exported data from FlowJo are not reliable (https://github.com/HelenaLC/CATALYST/issues/358#issuecomment-1655485231)? Thank you again for your input.

SamGG commented 1 year ago

To make my point clearer, I consider FJ results as correct, but I don't know how to reproduce FJ scaling. Good to have feedback from Spectre team. I will read the links you pointed above when I have time. Best.

denvercal1234GitHub commented 1 year ago

Thank you @SamGG. My biggest concern is whether we can use channel values (transformed by FlowJo) for clustering, because the result of such a clustering showed as above for all markers, the baseline is not 0 but rather around ~200-300, which is a bit odd for the interpretation.

SamGG commented 1 year ago

Using your figure, here is what I think is happening. I added a pseudo scale ranging from 0 to 1000 (or should it be 1024?). I think this is the mapping that FJ is applying to any transformed channel. This shows that the zero is around 250. I added 1 green box and 2 grey boxes. Those boxes represent 50% (green) and 25% (grey) of the full scale. The green box shows the range of intensity that is really used. The grey boxes show the ranges with no cell. FJ_scaling I think you should scale each channel so that the intensity cover the range of the 0..1000 pseudo range. I didn't test yet whether scaling to full range is important or not, but it is on my long todo list. I think it should not be important if dimension reduction is conducted in FJ. If transformation is carried out in Spectre, FlowSOM, CATALYST, R... then zero will be at zero. Hope this help. @tomashhurst @ghar1821 what is your opinion/experience?

tomashhurst commented 3 months ago

@denvercal1234GitHub just looking back over some of these issues -- @SamGG's image summarises it well, and this is also described in our transformation tutorial (https://immunedynamics.io/spectre/cytometry/#tutorials). Personally having run clustering on both channel value data and arcsinh transformed data. In theory the channel data has less overall 'sensitivity' (i.e. the range is something like 650) compared to arcsinh transformed data (which has potentially ~10^5 (in decimal points after scaling). However, I have not found huge differences between the two. If you run clustering/tSNE/UMAP etc in FlowJo, it actually uses the channel values behind the scenes.

@SamGG is right that in theory it would be best to scale each parameter individually such that the maximum range is utilised, but we found it tedious to do this in FlowJo, but easy to do it in R with arcsinh transformations.