JinmiaoChenLab / cytofkit

cytofkit: an integrated flow/mass cytometry data analysis pipeline
http://jinmiaochenlab.github.io/cytofkit/
57 stars 25 forks source link

"fixed" doesn't work #79

Open hanwa929 opened 4 years ago

hanwa929 commented 4 years ago

I'm using "JinmiaoChenLab/cytofkit". As you know, mergeMethod has "ceil", "fixed", "min", "all". But "fixed" doesn't work. When I have 2 fcs files (ex. 2000cells and 11000cells), I set "fixed" with fixedNum=10000, I still get only 4000 x 30 data was extracted (This is the same as "min"). I mean a fixed num (specified by fixedNum) of cells aren't sampled with replacement. I made sure other mergeMethods more than "fixed" work as defined. My friend's laptop also shows the same result. I'd like to extract cells up to fixedNum with replacement. Have someone faced the same issue? Could you solve it?

SamGG commented 4 years ago

Hi! Correct, there is a bug.Thanks for reporting.

The code corresponding to the "fixed" strategy is correct. https://github.com/JinmiaoChenLab/cytofkit/blob/c4f93e5d849cf670c3825e5abacd850822c8eca7/R/cytof_preProcess.R#L70-L75 But there is additional code before that lowers fixedNum to the minimum. https://github.com/JinmiaoChenLab/cytofkit/blob/c4f93e5d849cf670c3825e5abacd850822c8eca7/R/cytof_preProcess.R#L47-L55 This code should be remove (it may have been added as workaround for tSNE, see below).

Nevertheles, I would like to ask you to think about sampling with replacement means. Here is my humble opinion. First, it putting too much weight on an experiment that does not enough events. So I would not do that. Second, tSNE does not like duplicated events unless check_duplicated = FALSE is forced. Third, as a workaround I would add some jittering (aka randomization) to sampled data with replacement, what is not currently implemented in cytofkit. For example I would try to add uniform random numbers from -0.1 to 0.1 to the data once the transformation asinh(x/5) has been applied. -0.1 ~ asinh(0.5/5)-asinh(1/5), 0.1 ~ asinh(1.5/5)-asinh(1/5), ie 0.5 around 1, the lowest positive count. Best.

hanwa929 commented 4 years ago

Thank you for quick response. I'm using cytofkit_GUI(), have never used programing regarding cytofkit. Could you solve this issue in cytofkit_GUI()?

SamGG commented 4 years ago

Being not the developer, I can't not solve it in this repository. I have just push a fix in my repository https://github.com/i-cyto/cytofkit but didn't test it. Install cytofkit from github using this repository meanwhile it will be fixed here.

if(!require(devtools)) install.packages("devtools") # If not already installed
devtools::install_github("i-cyto/cytofkit")

The GUI is slightly different (clearer IMHO). Let me know if this fix solved your problem.

hanwa929 commented 4 years ago

Thank you for your suggestion. I tried i-cyto/cytofkit that you suggested. But the messages described below finally showed up; 1: (function (fcsFiles, comp = FALSE, transformMethod = c("autoLgcl", で: One or more FCS files have less events than specified fixedNum 2: (function (fcsFiles, comp = FALSE, transformMethod = c("autoLgcl", で: using replacement and uniform randomization 3: Removed 13848 rows containing missing values (geom_point).

And when I opened the R.Data in cytofkitShinyAPP(), you watch; "duplicate 'row.names' are not allowed" and you cannot visualize t-SNE map.

Seems like this setting cannot deal with the same single cell data, so that is you cannot do with replacement..

If you have some idea, please let me know.

SamGG commented 4 years ago

I will setup a data set to test this and let you know. The messages look like warnings, not errors. This sounds like the warnings I added to alert about replacement. So I would say it's correct. I don't know about the missing values during the plot. The duplicate row.names might be a consequence of the replacement. I will look into this. A quick and dirty trick would be to change the identifiers of the sampled events, and therefore to loose the link to the original FCS file. I am not sure this would be a great problem, because nobody really look which events were sampled.

hanwa929 commented 4 years ago

As you say, this seems to be warning. Before the warning message, you can look at a described message; if (nchar(shape_string[1]) <= 1) {Error: missing value where TRUE/FALSE needed. I also think the duplicate row.names is a consequence of the replacement.

I understand what you are meaning in the latter half, but I don't know how to do it. I think the procedure described below is the same as what you mean. If you can duplicate data and change identifies of the duplicated sample events before analysis, so it would enable to analyze data including duplicated data. But I don't know how to duplicate and change identifiers of the duplicated events in fcs files because I can deal with only fcs files in cytofkit_GUI().

SamGG commented 4 years ago

Should be fixed. Just reinstall from i-cyto and start again the shinyApp command. Let me know.

hanwa929 commented 4 years ago

It worked, Great! But I have one issue. The cluster percentages aren't described in csv file. I attached the picture. Seems like this issue is coming from duplicated data because CMV1a is a duplicated file. And cluster percentages of the remaining two fcs files aren't described although I applied 4 fcs files. image0

And, you still watch messages described below when you run the analysis even though this may not be so big issue.

if (nchar(shape_string[1]) <= 1) {Error: missing value where TRUE/FALSE needed. 1: (function (fcsFiles, comp = FALSE, transformMethod = c("autoLgcl", で: One or more FCS files have less events than specified fixedNum 2: (function (fcsFiles, comp = FALSE, transformMethod = c("autoLgcl", で: using replacement and uniform randomization 3: Removed 13848 rows containing missing values (geom_point).

SamGG commented 4 years ago

Right, fix was too easy, there is a price to pay. Should we compute keeping duplicated cells or removing replicated cells ?

SamGG commented 4 years ago

Install and try again ;-) You directly rerun the shiny app and save files again. Verify the exported FCS files, I didn't. The percentage files are correctly formatted, but I didn't check against manual gating. I have got same 3 errors/warnings as you, but I didn't check yet.

hanwa929 commented 4 years ago

You changed the setting? I forced installation because the message described below came up. Skipping install of 'cytofkit' from a github remote, the SHA1 (1a1a9ca1) has not changed since last install. The result is still the same. But this may be because I'm not able to update if you update the setting.

SamGG commented 4 years ago

Sorry, should be OK right now. https://github.com/i-cyto/cytofkit/commit/b1a7cd64d5599ab96c4b4bbd2169e56c2d6b7d28

hanwa929 commented 4 years ago

Great, it's perfect! Thank you so much.