RGLab / cytoqc

A Quality Control tool for cytometry
Other
6 stars 2 forks source link

WorkFlow to fix missing marker names? #2

Closed vivek-verma202 closed 4 years ago

vivek-verma202 commented 4 years ago

@gfinak and @mikejiang , cytoqc is amazing and can't wait for its Bioconductor launch, thank-you! I was able to fix channel names, however, got stuck with marker names:

res <- cqc_check(cqc_data, type = "marker")
> res
# A tibble: 2 x 3
  group_id  nFCS marker                                              
     <int> <int> <chr>                                               
1        2   100 "CCL4, CD107a, CD16, CD56, CD57, IFNg, NKG2A, NKG2C"
2        1    12 ""                                        
> table(res$marker,res$group_id)
                   1     2
                   12    0
CCL4               0    100
CD107a             0    100
CD16               0    100
CD56               0    100
CD57               0    100
IFNg               0    100
NKG2A              0    100
NKG2C              0    100
> res1 <- cqc_match(res, ref = 2)
Warning message:
Unmatched items remain after cqc_match. Before using cqc_fix, please resolve these unmatched items manually using
cqc_match_update/remove/delete_unmatched or re-attempt automatic matching with cqc_match with a larger max.distance
argument. 
> res1 <- cqc_match_update(res1, map = c(""="CCL4"),
+                          group_id = 1)
Error: attempt to use zero-length variable name

How could I fill-in the missing marker names? OR Is there a way to read.flowSet ignoring $PnS? OR How can I just delete marker information ($PnS) completely from the 100 (group 2) files? All I need to do is prepare a flowset so that I can proceed with my analysis. Many thanks!

jacobpwagner commented 4 years ago

Have you checked out cqc_check by panel? It should be able to fill in the missing markers using the channels. That is, try this for your check:

res <- cqc_check(cqc_data, type = "panel", by = "channel")
gfinak commented 4 years ago

If I'm reading this right you have 12 files with no marker names at all? Use the cqc_check panel, and the channel names as an anchor. @jacobpwagner or @mikejiang know the exact API call.

vivek-verma202 commented 4 years ago

@jacobpwagner , cqc_check filled missing space with NAs, what should be the next step? Thanks.

> res <- cqc_check(cqc_data, type = "marker")
> res
# A tibble: 2 x 3
  group_id  nFCS marker                                              
     <int> <int> <chr>                                               
1        2   100 "CCL4, CD107a, CD16, CD56, CD57, IFNg, NKG2A, NKG2C"
2        1    12 ""                                                  
> res <- cqc_check(cqc_data, type = "panel", by = "channel")
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 12 rows [65, 66, 67, 68, 69, 70, 471, 472, 473, 474, 475, 476]. 
> res
# A tibble: 8 x 3
  channel                  `group 1(n=12)` `group 2(n=100)`
  <chr>                    <chr>           <chr>           
1 FJComp-Alexa Fluor 700-A NA              CCL4            
2 FJComp-APC-A             NA              NKG2A           
3 FJComp-APC-Cy7-A         NA              CD16            
4 FJComp-BV510-A           NA              IFNg            
5 FJComp-BV605-A           NA              CD56            
6 FJComp-BV711-A           NA              CD107a          
7 FJComp-PE-A              NA              CD57            
8 FJComp-PE-Cy7-A          NA              NKG2C           
> cqc_fix(res)
Error in cqc_fix.default(res) : 
  The input is not a valid 'cqc_match' result!
Please make sure to follow the right order of the 'cqc_check-->cqc_match-->cqc_fix' workflow!
> res1 <- cqc_match(res, ref = 2)
Error in (function (df, ...)  : 
  channel is not consistent across panel groups!Please standardize it first!
jacobpwagner commented 4 years ago

Well, the next step should be using cqc_match as before, which should propose the fix of updating the marker names:

res1 <- cqc_match(res, ref = 2)
cqc_fix(res1)

However, in my testing this is currently a sort of unhandled case, as missing markers are currently clipped out from the check table on the assumption that those are effectively like scatter channels (channels without a markername): https://github.com/RGLab/cytoqc/blob/f2cc19a5a6edfeabbb34e76a153cd330b74431ef/R/cqc_check.R#L139

Exactly, you'll get the error about inconsistent channels. This is an edge case we have to address. I'll get on it right away.

vivek-verma202 commented 4 years ago

@jacobpwagner , you're right, it did give an error:

> res1 <- cqc_match(res, ref = 2)
Error in (function (df, ...)  : 
  channel is not consistent across panel groups!Please standardize it first!

sorry for the trouble, I shall be waiting for an use case / fix.

jacobpwagner commented 4 years ago

@vivek-verma202 , after https://github.com/RGLab/cytoqc/commit/8812b9f4361f95ffb3b88a74d5112f1bbd91058c this should work:

res <- cqc_check(cqc_data, type = "panel", by = "channel")
res1 <- cqc_match(res, ref = 2)
cqc_fix(res1)

This is the test case I've been working with. The first part is just setting up a scenario similar to yours.

> library(flowCore)
> library(flowWorkspace)
> library(cytoqc)
> cs <- load_cytoset_from_fcs(list.files(system.file("extdata", package = "flowWorkspaceData"), pattern = "a2004", full.names = TRUE))
> drop_cols <- which(grepl("-A", colnames(cs)))
> cs <- realize_view(cs[,-drop_cols])
> empty_markers <- rep("",8)
> names(empty_markers) <- colnames(cs)[5:12]
> markernames(cs[[2]]) <- empty_markers
> cqc_data <- cqc_cf_list(cytoset_to_list(cs))
> check_res <- cqc_check(cqc_data, type = "panel", by = "channel")
> check_res
# A tibble: 8 x 3
  channel        `group 1(n=1)` `group 2(n=1)`
  <chr>          <chr>          <chr>         
1 Alexa 700-H    NA             TNFa          
2 Am Cyan-H      NA             CD123         
3 APC-CY7-H      NA             IL-6          
4 APC-H          NA             CD11c         
5 FITC-H         NA             IFNa          
6 Pacific Blue-H NA             IL-12         
7 PE-CY7-H       NA             CD14          
8 PerCP-CY5-5-H  NA             MHCII         
> match_res <- cqc_match(check_res, ref = 2)
> match_res
# A tibble: 8 x 3
  channel        `group 1(n=1)` `Ref group`
  <clr_vctr>     <chr>          <clr_vctr> 
1 Alexa 700-H    NA             TNFa       
2 Am Cyan-H      NA             CD123      
3 APC-CY7-H      NA             IL-6       
4 APC-H          NA             CD11c      
5 FITC-H         NA             IFNa       
6 Pacific Blue-H NA             IL-12      
7 PE-CY7-H       NA             CD14       
8 PerCP-CY5-5-H  NA             MHCII      
> cqc_fix(match_res)
> check_res <- cqc_check(cqc_data, type = "panel", by = "channel")
> check_res
# A tibble: 8 x 2
  channel        `group 1(n=2)`
  <chr>          <chr>         
1 Alexa 700-H    TNFa          
2 Am Cyan-H      CD123         
3 APC-CY7-H      IL-6          
4 APC-H          CD11c         
5 FITC-H         IFNa          
6 Pacific Blue-H IL-12         
7 PE-CY7-H       CD14          
8 PerCP-CY5-5-H  MHCII  

Let me know if you are still having issues and I can take another look.

jacobpwagner commented 4 years ago

Actually, I didn't see that this interfered with some downstream logic so I'll need to revert it and make a more substantial change.

vivek-verma202 commented 4 years ago

@jacobpwagner , I removed the old package, downloaded the fresh one, still got the same error with cqc_match:

> library(cytoqc)
Registered S3 methods overwritten by 'colortable':
  method                from     
  knit_print.data.frame rmarkdown
  print.data.frame      base     
Warning message:
replacing previous import ‘Rgraphviz::style’ by ‘crayon::style’ when loading ‘cytoqc’ 
> files <- list.files(
+     path = "C:/Users/vverma3/Desktop/FM_flow_cytometry/NKA/data/fcs/01_cleaned",
+     full.names = T
+ )
> cqc_data <- cqc_load_fcs(files)
> res <- cqc_check(cqc_data, type = "panel", by = "channel")
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 12 rows [65, 66, 67, 68, 69, 70, 471, 472, 473, 474, 475, 476]. 
> res
# A tibble: 8 x 3
  channel                  `group 1(n=12)` `group 2(n=100)`
  <chr>                    <chr>           <chr>           
1 FJComp-Alexa Fluor 700-A NA              CCL4            
2 FJComp-APC-A             NA              NKG2A           
3 FJComp-APC-Cy7-A         NA              CD16            
4 FJComp-BV510-A           NA              IFNg            
5 FJComp-BV605-A           NA              CD56            
6 FJComp-BV711-A           NA              CD107a          
7 FJComp-PE-A              NA              CD57            
8 FJComp-PE-Cy7-A          NA              NKG2C           
> res1 <- cqc_match(res, ref = 2)
Error in (function (df, ...)  : 
  channel is not consistent across panel groups!Please standardize it first!

Is there a way to troubleshoot it?

jacobpwagner commented 4 years ago

@vivek-verma202 , I'm just finalizing some changes that will fix this. See https://github.com/RGLab/cytoqc/pull/3. I'll let you know as soon as I finalize it and merge it.

jacobpwagner commented 4 years ago

Alright @vivek-verma202 , now it should be good to go. Doing it right just required slightly deeper changes than originally projected. Anyway, now if you pull those changes and rebuild you should be able to do this (assuming group 2 is the group with markers and group 1 without):

res <- cqc_check(cqc_data, type = "panel", by = "channel")
res1 <- cqc_match(res, ref = 2)
cqc_fix(res1)

That should appropriately fill in the missing markers. Let me know if you run in to any troubles.

vivek-verma202 commented 4 years ago

Worked smoothly, Thanks a ton, @jacobpwagner !