JinmiaoChenLab / cytofkit

cytofkit: an integrated flow/mass cytometry data analysis pipeline
http://jinmiaochenlab.github.io/cytofkit/
57 stars 27 forks source link

All parameters are used for PhenoGraph, yielding useless clusters #11

Closed esimonds closed 7 years ago

esimonds commented 7 years ago

The current version of Cytofkit on Bioconductor (1.8.3) appears to use all parameters for the PhenoGraph clustering step, causing very uninformative clusters (see attached PDF). Also, it throws an error when performing PhenoGraph clustering, whereas previous versions (as late as 1.8.1) did not.

Screenshots showing PhenoGraph bug in Cytofkit v183.pdf

Testing:

I used the FCS file included with the R package ("130515_C2_stim_CD19-.fcs") for testing, and I compared v1.8.1 to v1.8.3 with the same analysis parameters.

Console output from 1.8.3:


> library(cytofkit)
Loading required package: ggplot2
Loading required package: plyr
> cytofkit_GUI()
Input arguments:
* Project Name: cytofkit 
* Input FCS files for analysis:
   -130515_C2_stim_CD19-.fcs
* Markers:
   -(Cd112)Di<CD14>
   -(La139)Di<CD45>
   -(Nd142)Di<HLA-DR>
   -(Nd146)Di<CD8>
   -(Sm154)Di<CD3>
   -(Gd156)Di<CD19>
* Data merging method: ceil 
* Data transformation method: cytofAsinh 
* Dimensionality reduction method: tsne 
* Data clustering method(s): Rphenograph 
* Data visualization method(s): tsne 
* Subset progression analysis method: NULL 

Extract expression data...
   5000  x  52  data was extracted!
Dimension reduction...
  Running t-SNE...with seed 42  DONE
Run clustering...
  Running PhenoGraph...Run Rphenograph starts:
  -Input data of 5000 rows and 52 columns
  -k is set to 30
  Finding nearest neighbors...DONE ~ 2.628 s
  Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 2.102 s
  Build undirected graph from the weighted links...DONE ~ 0.424 s
  Run louvain clustering on the graph ...DONE ~ 0.362 s
Run Rphenograph DONE, took a total of 5.51599999999996s.
  Return a community class
  -Modularity value: 0.8562154 
  -Number of clusters: 18 DONE!
Progression analysis...
Listing markers used for dimension reduction...
Wrapping results...
Analysis DONE, saving the results...
R object is saved in  cytofkit.RData 
  **THIS R OBJECT IS THE INPUT OF SHINY APP!**  
Save to file: /Users/esimonds/Downloads/130515_C2_stim_CD19-.fcs 
Writing results Done! Results are saved under path: /Users/esimonds/Downloads
Warning message:
In if (!(right_marker)) { :
  the condition has length > 1 and only the first element will be used

Console output from 1.8.1:

Restarting R session...

> 
> library(cytofkit)
Loading required package: ggplot2
Loading required package: plyr
> packageVersion("cytofkit")
[1] ‘1.8.1’
> cytofkit_GUI()
Input arguments:
* Project Name: cytofkit_181 
* Input FCS files for analysis:
   -130515_C2_stim_CD19-.fcs
* Makrers:
   -(Cd112)Di
   -(La139)Di
   -(Nd142)Di
   -(Nd146)Di
   -(Sm154)Di
   -(Gd156)Di
* Data merging method: ceil 
* Data transformation method: cytofAsinh 
* Dimensionality reduction method: tsne 
* Data clustering method(s): Rphenograph 
* Data visualization method(s): tsne 
* Subset progression analysis method: NULL 

Extract expression data...
   5000  x  6  data was extracted!
Dimension reduction...
  Runing t-SNE...with seed 42  DONE
Run clustering...
  Runing PhenoGraph...Run Rphenograph starts:
  -Input data of 5000 rows and 6 columns
  -k is set to 30
  Finding nearest neighbors...DONE ~ 0.124 s
  Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 2.411 s
  Build undirected graph from the weighted links...DONE ~ 0.414 s
  Run louvain clustering on the graph ...DONE ~ 0.318 s
Run Rphenograph DONE, totally takes 3.267s.
  Return a community class
  -Modularity value: 0.8043747 
  -Number of clusters: 15 DONE!
Progression analysis...
Analysis DONE, saving the reuslts...
R obejct is saved in  cytofkit_181.RData 
  **THIS R OBJECT IS THE INPUT OF SHINY APP!**  
Save to file: /Users/esimonds/Downloads/130515_C2_stim_CD19-.fcs 
Writing results Done! Results are saved under path: /Users/esimonds/Downloads

The tSNE maps look identical between the two versions, which tells me that the tSNE code is correctly using only the 6 parameters I specified.

The PhenoGraph clusters in v1.8.1 look good (they correlate with cell type). Notice that the console output says:

Runing PhenoGraph...Run Rphenograph starts: -Input data of 5000 rows and 6 columns

...that's all good.

The PhenoGraph clusters in v1.8.3 look almost random. Notice that the console output says:

Running PhenoGraph...Run Rphenograph starts: -Input data of 5000 rows and 52 columns

We don't want to run PhenoGraph on the entire matrix because the clusters will be defined by useless parameters, including Time and barcode parameters. Please fix! Thanks!

jinmiaochen commented 7 years ago

Hi Erin,

Thanks for your feedback. Our intension is to use selected parameters for clustering, but allow users to view the expression of all markers. We will look into it and get back to you.

Best, Jinmiao

From: Erin Simonds [mailto:notifications@github.com] Sent: Wednesday, October 11, 2017 12:56 PM To: JinmiaoChenLab/cytofkit cytofkit@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [JinmiaoChenLab/cytofkit] All parameters are used for PhenoGraph, yielding useless clusters (#11)

The current version of Cytofkit on Bioconductor (1.8.3) appears to use all parameters for the PhenoGraph clustering step, causing very uninformative clusters (see attached PDF). Also, it throws an error when performing PhenoGraph clustering, whereas previous versions (as late as 1.8.1) did not.

Screenshots showing PhenoGraph bug in Cytofkit v183.pdfhttps://github.com/JinmiaoChenLab/cytofkit/files/1374239/Screenshots.showing.PhenoGraph.bug.in.Cytofkit.v183.pdf

Testing:

I used the FCS file included with the R package ("130515_C2_stim_CD19-.fcs") for testing, and I compared v1.8.1 to v1.8.3 with the same analysis parameters.

Console output from 1.8.3:

`

library(cytofkit) Loading required package: ggplot2 Loading required package: plyr cytofkit_GUI() Input arguments:

Extract expression data... 5000 x 52 data was extracted! Dimension reduction... Running t-SNE...with seed 42 DONE Run clustering... Running PhenoGraph...Run Rphenograph starts: -Input data of 5000 rows and 52 columns -k is set to 30 Finding nearest neighbors...DONE ~ 2.628 s Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 2.102 s Build undirected graph from the weighted links...DONE ~ 0.424 s Run louvain clustering on the graph ...DONE ~ 0.362 s Run Rphenograph DONE, took a total of 5.51599999999996s. Return a community class -Modularity value: 0.8562154 -Number of clusters: 18 DONE! Progression analysis... Listing markers used for dimension reduction... Wrapping results... Analysis DONE, saving the results... R object is saved in cytofkit.RData THIS R OBJECT IS THE INPUT OF SHINY APP! Save to file: /Users/esimonds/Downloads/130515_C2_stim_CD19-.fcs Writing results Done! Results are saved under path: /Users/esimonds/Downloads Warning message: In if (!(right_marker)) { : the condition has length > 1 and only the first element will be used`

Console output from 1.8.1:

`Restarting R session...

library(cytofkit) Loading required package: ggplot2 Loading required package: plyr packageVersion("cytofkit") [1] ‘1.8.1’ cytofkit_GUI() Input arguments:

Extract expression data... 5000 x 6 data was extracted! Dimension reduction... Runing t-SNE...with seed 42 DONE Run clustering... Runing PhenoGraph...Run Rphenograph starts: -Input data of 5000 rows and 6 columns -k is set to 30 Finding nearest neighbors...DONE ~ 0.124 s Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 2.411 s Build undirected graph from the weighted links...DONE ~ 0.414 s Run louvain clustering on the graph ...DONE ~ 0.318 s Run Rphenograph DONE, totally takes 3.267s. Return a community class -Modularity value: 0.8043747 -Number of clusters: 15 DONE! Progression analysis... Analysis DONE, saving the reuslts... R obejct is saved in cytofkit_181.RData THIS R OBJECT IS THE INPUT OF SHINY APP! Save to file: /Users/esimonds/Downloads/130515_C2_stim_CD19-.fcs Writing results Done! Results are saved under path: /Users/esimonds/Downloads`

The tSNE maps look identical between the two versions, which tells me that the tSNE code is correctly using only the 6 parameters I specified.

The PhenoGraph clusters in v1.8.1 look good (they correlate with cell type). Notice that the console output says:

Runing PhenoGraph...Run Rphenograph starts: -Input data of 5000 rows and 6 columns

...that's all good.

The PhenoGraph clusters in v1.8.3 look almost random. Notice that the console output says:

Running PhenoGraph...Run Rphenograph starts: -Input data of 5000 rows and 52 columns

We don't want to run PhenoGraph on the entire matrix because the clusters will be defined by useless parameters, including Time and barcode parameters. Please fix! Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/JinmiaoChenLab/cytofkit/issues/11, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIe92l_eNM_Ju4NY6QkV-NO53d9T4P6Mks5srEpNgaJpZM4P06mC.

This e-mail and any attachments are only for the use of the intended recipient and may contain material that is confidential, privileged and/or protected by the Official Secrets Act. If you are not the intended recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person.

esimonds commented 7 years ago

Update & good news: Version 1.9.4 in the Bioconductor development branch seems to work as intended: PhenoGraph clustering is performed on only the 6 user-selected parameters, and all parameters (included unselected parameters) are available for viewing in the Shiny GUI.

Users can easily switch to version 1.9.4 by running:

BiocInstaller::useDevel()
biocLite("cytofkit")

The bad news is, if this bug is confirmed, everyone that has downloaded Cytofkit from Bioconductor since v3.5 was released (April 25, 2017) has probably been getting really awful PhenoGraph clusters :(

cytofkit v1 9 4 output
jinmiaochen commented 7 years ago

Thanks Erin! is there a way to update the version on Bioconductor v3.5 or inform users not to use that version?

Best, Jinmiao

From: Erin Simonds [mailto:notifications@github.com] Sent: Wednesday, October 11, 2017 1:22 PM To: JinmiaoChenLab/cytofkit cytofkit@noreply.github.com Cc: Chen Jinmiao Chen_Jinmiao@immunol.a-star.edu.sg; Comment comment@noreply.github.com Subject: Re: [JinmiaoChenLab/cytofkit] All parameters are used for PhenoGraph, yielding useless clusters (#11)

Update & good news: Version 1.9.4 in the Bioconductor development branchhttps://bioconductor.org/packages/devel/bioc/html/cytofkit.html seems to work as intended: PhenoGraph clustering is performed on only the 6 user-selected parameters, and all parameters (included unselected parameters) are available for viewing in the Shiny GUI.

The bad news is, if this bug is confirmed, everyone that has downloaded Cytofkit from Bioconductor since v3.5 was released (April 25, 2017) has probably been getting really awful PhenoGraph clusters :(

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/JinmiaoChenLab/cytofkit/issues/11#issuecomment-335686377, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIe92i50hUjBy1ExOMGm3whpFXHa4P3Sks5srFBugaJpZM4P06mC.

This e-mail and any attachments are only for the use of the intended recipient and may contain material that is confidential, privileged and/or protected by the Official Secrets Act. If you are not the intended recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person.

esimonds commented 7 years ago

I think so, but I'm not sure how to do it -- you can try asking on the bioc-devel mailing list: https://master.bioconductor.org/help/support/

lconde-ucl commented 7 years ago

Hi Erin and Jinmiao,

in case it's useful, the bug in version 1.8.3 happens in the _cytofcluster step:

cluster_res <- lapply(clusterMethods, cytof_cluster, 
                          ydata = allDimReducedList[[dimReductionMethod]], 
                          xdata = exprs_data,
                          FlowSOM_k = as.numeric(flowsom_num))

where exprs_data contains the expression data of all the markers. This was corrected recently (in version 1.9) when this part of the code was changed to:

cluster_res <- lapply(clusterMethods, cytof_cluster, 
                          ydata = allDimReducedList[[dimReductionMethod]], 
                          xdata = exprs_data[, markers],
                          FlowSOM_k = as.numeric(flowsom_num))

It was not an issue in previous versions because before 1.8.3, the expression data for the markers of interest was being extracted in a previous step (specifically in _cytofexprsMerge), and therefore exprs_data was already filtered when passed to _cytofcluster

Hope this helps, Lucia

esimonds commented 7 years ago

Nice sleuthing, Lucia! Thanks for figuring out the offending piece of code. That gives me more confidence that this bug is real.

So, basically anyone that is currently on version 1.8.3 needs to upgrade to 1.9.4.

For any users reading this, you can check your version with: packageVersion("cytofkit")

MattMyint commented 7 years ago

Thanks Erin and Lucia,

I've thrown in a quick fix to the release version, but yeah, unfortunately, users since that update would have encountered this unknowingly.

I'll leave the issue open for better visibility for affected users

esimonds commented 7 years ago

Awesome, thanks Matt. Does that mean users can update without switching to the development branch of BioConductor?

MattMyint commented 7 years ago

Yup! But it'll only be available after the nightly build, so within 1-2 days.

SamGG commented 7 years ago

Why not using the current Github version?

MattMyint commented 7 years ago

The current Github version is the development branch, whereas the normal version of bioconductor hosts the stable release version.

Either is fine to overcome this bug, but having the fix on the release version would make it easier for the majority of users that access cytofkit from the release version of bioconductor rather than the devel version.

esimonds commented 7 years ago

OK, the fixed version (1.8.4) has gone live on the BioConductor release servers, including Macintosh and Windows binaries. I confirmed that it installs OK and the bug is gone. Thanks again, Matt.

All users* should upgrade to 1.8.4 by running these three commands in R:

source("https://bioconductor.org/biocLite.R")
biocLite("cytofkit")
packageVersion('cytofkit')

*If you are using the Github version of cytofkit, or the development branch of BioConductor, then you are probably OK. You can check your current version of Cytofkit with packageVersion('cytofkit') If the result is less than 1.8.4, then you need to upgrade.

Mkang1204 commented 6 years ago

May I know if I can change the numbers of default instead of using the default one? If so - Where I an change them. Thank you:)

image