Support exclusion of specific markers and marker combinations from the Polyfunctionality Score calculation

gfinak commented 6 years ago

The lab wants to exclude IL4 single positive, GzB single positive and IL14/GzB double positive cell combinations from the Polyfunctionality Score calculations due to artefacts detected with those antibodies.

Is this currently supported?
Can we add an argument to support exclusion of cell subsets from the PFS calculation?

gfinak commented 6 years ago

Here's a reprex of how to do this currently:

The first part is boilerplate to construct an actual fitted COMPASSResult object. The latter part shows how to exclude the IL17A and IL17F single positive and double positive cell subsets from the calculation of the Polyfunctionality score manually.

library(COMPASS)
library(readxl)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(assertthat)
f <- system.file("extdata","SimpleCOMPASSExample.xlsx",package="COMPASS")
data <- read_excel(f)

# tidy the input count matrix
# rename the subsets, keep only the booleans, ensure consistent ordering
tidy_data <- data %>% 
  gather(Population, Count, -c(1:3)) %>%
  mutate(Population = gsub("S/Avid-/L/CD3\\+/CD4\\+","CD4",Population)) %>%
  mutate(Population = gsub(",Count","",Population)) %>%
  mutate(Population = gsub("22","IL22",Population)) %>%
  mutate(Population = gsub("10","IL10",Population)) %>%
  mutate(Population = gsub("17A","IL17A",Population)) %>%
  mutate(Population = gsub("17F","IL17F",Population)) %>%
  mutate(Population = gsub("MIP","MIP1B",Population)) %>%
  mutate(Population = gsub("TNF","TNFa",Population)) %>%
  mutate(Population = translate_marker_names(Population)) %>%
  mutate(Population = gsub("([&!])2&","\\1IL2&",Population)) %>%
  mutate(Population = gsub("CD4/","",Population)) %>%
  filter(Population != "CD4") %>% 
  mutate(Parent = "CD4") %>%
  spread(Population, Count) %>% 
  arrange(`PATIENT ID`,Stimulation, GROUP, Parent) %>% 
  mutate(Stimulation = toupper(Stimulation))

# get the stim and unstim data into separate data frames
nu = tidy_data %>% filter(Stimulation == "UNSTIMULATED") %>% arrange(`PATIENT ID`)

ns = tidy_data %>% filter(Stimulation != "UNSTIMULATED") %>% arrange(`PATIENT ID`)

# order of subject ids should be the same
assertthat::are_equal(nu$`PATIENT ID`,ns$`PATIENT ID`)
#> [1] TRUE

# extract the metadata and the stim and unstim count matrices
metadata <- ns %>% select(`PATIENT ID`,GROUP,Stimulation,Parent)
nu <- nu %>% select(-`PATIENT ID`,-GROUP,-Stimulation,-Parent) %>% as.matrix
ns <- ns %>% select(-`PATIENT ID`,-GROUP,-Stimulation,-Parent) %>% as.matrix

# reverse the order of the cell subsets, so that the all-negative is the last column
nu <- nu[,rev(colnames(nu))]
ns <- ns[,rev(colnames(ns))]

# rownames must be set
rownames(nu) <- metadata$`PATIENT ID`
rownames(ns) <- metadata$`PATIENT ID`
metadata <- as.data.frame(metadata)
rownames(metadata) = metadata$`PATIENT ID`

# quick fit via SimpleCOMPASS interface
simplefit <- SimpleCOMPASS(n_s = ns, n_u = nu, meta = metadata, individual_id = "PATIENT ID",iterations = 1000, replications = 1)
#> Initializing parameters...
#> Computing initial parameter estimates...
#> Iteration 1000 of 1000.
#> Fitting model with 1 replications.
#> Running replication 1 of 1...
#> Iteration 1000 of 1000.
#> Done!

# Now compute the polyfunctionality score excluding IL17A and IL17F and IL17A/IL17F

## first get indices of the three cell subsets from the categories matrix
il17a_single <- as.data.frame(simplefit$data$categories)$IL17A == 1 & as.data.frame(simplefit$data$categories)$Counts == 1

il17f_single <- as.data.frame(simplefit$data$categories)$IL17F == 1 & as.data.frame(simplefit$data$categories)$Counts == 1

il17af_double <- as.data.frame(simplefit$data$categories)$IL17A == 1 & 
  as.data.frame(simplefit$data$categories)$IL17F == 1 &
  as.data.frame(simplefit$data$categories)$Counts == 2

indices <- il17a_single|il17af_double|il17f_single

## next extract the Gamma matrix from the fit

gamma_mat <- COMPASS:::Gamma(simplefit)
dim(gamma_mat)
#> [1]   21  256 1000
# Third dimension is the iterations
# second dimension is the cell subsets
# first dimension is the subjects
# we want to average across the iterations to compute the mean_gamma matrix
# from which we will construct a PFS score.

# compute the mean_gamma matrix averaging over iterations
mean_gamma <- apply(gamma_mat[,!indices,],1:2,mean)

# get the degree of each subset in the matrix.
degree <- simplefit$data$categories[!indices,"Counts"]

## Compute the reduced score.
reduced_score <- PolyfunctionalityScore(x = mean_gamma, degree = degree, n = log(256,2))

#combine with the usual scores
merged <- scores(simplefit) %>% 
  arrange(`PATIENT ID`) %>% 
  bind_cols(reduced_PFS = reduced_score)

# do these make sense
all(merged$reduced_PFS <= merged$PFS)
#> [1] TRUE

Created on 2018-08-14 by the reprex package (v0.2.0).

gfinak commented 5 years ago

this can be partly accomplished with the FS and PFS APIs passing in markers= to specify markers to include in the calculation.

RGLab / COMPASS

Support exclusion of specific markers and marker combinations from the Polyfunctionality Score calculation #62