Issue with making SCE object

sudu87 commented 4 years ago

Hello Helena,

Thank you for the suggestion to use the new version of the pipeline to use in CyTOF data analysis.

https://bioconductor.org/packages/release/bioc/vignettes/CATALYST/inst/doc/differential.html

Background:

Experiment specific info in:

https://github.com/markrobinsonuzh/cytofWorkflow/issues/17

I am not sure may be the way i got the data, not all channels in the FCS file is of my interest. I guess this is creating a problem in creating the SCE object.

R.version

  ## platform       x86_64-apple-darwin17.0     
  ## arch           x86_64                      
  ## os             darwin17.0                  
  ## system         x86_64, darwin17.0       
  ## version.string R version 4.0.1 (2020-06-06)

Installing and loading packages

library(CATALYST)
library(cowplot)
library(flowCore)
library(diffcyt)
library(scater)
library(SingleCellExperiment)
library(readxl)

Reading FCS filename

fcs_raw <- read.flowSet(md$file_name, path="~/Documents/Research/Results/Cell_culture_and_macrophages/CyTOF/20200603_Sudip_Human_Microbiota",transformation = FALSE, truncate_max_range = FALSE)
fcs_raw

## A flowSet with 7 experiments.
## 
##   column names:
##   Au197Di BCKG190Di Ba138Di Bi209Di Ce140Di Center Cs133Di Dy161Di Dy162Di Dy163Di Dy164Di Er166Di Er167Di Er168Di Er170Di Eu151Di Eu153Di Event_length Gd155Di Gd156Di Gd157Di Gd158Di Gd160Di Ho165Di I127Di In113Di In115Di Ir191Di Ir193Di La139Di Lu175Di Nd142Di Nd143Di Nd144Di Nd145Di Nd146Di Nd148Di Nd150Di Offset Pb208Di Pd102Di Pd104Di Pd105Di Pd106Di Pd108Di Pd110Di Pr141Di Pt194Di Pt195Di Pt198Di Residual Rh103Di Sm147Di Sm149Di Sm152Di Sm154Di Sn120Di Tb159Di Tm169Di Xe131Di Y89Di Yb171Di Yb172Di Yb173Di Yb174Di Yb176Di Time

Reading metadata file

md <- read_excel("~/Documents/Research/Results/Cell_culture_and_macrophages/CyTOF/macros_metadata.xlsx")

Make sure condition variables are factors with the right levels

md$condition <- factor(md$condition, levels = c("undiff", "dmso","diff","lps","pgn","lps + pgn","gcm"))
data.frame(md)

 ##            file_name sample_id condition patient_id
## 1          monos.fcs     SDCY1    undiff       exp1
## 2     monos_dmso.fcs     SDCY2      dmso       exp1
## 3         macros.fcs     SDCY3      diff       exp1
## 4     macros_lps.fcs     SDCY4       lps       exp1
## 5     macros_pgn.fcs     SDCY5       pgn       exp1
## 6 macros_lps_pgn.fcs     SDCY6 lps + pgn       exp1
## 7     macros_gcm.fcs     SDCY7       gcm       exp1

panel <- read_excel("~/Documents/Research/Results/Cell_culture_and_macrophages/CyTOF/antigen_panel.xlsx")
data.frame(panel)

 ##       fcs_colname  antigen marker_class
 ## 1           197Au     <NA>         none
 ## 2         190BCKG     <NA>         none
 ## 3           138Ba     <NA>         none
 ## 4           209Bi     <NA>         none
 ## 5           140Ce     <NA>         none
 ## 6           133Cs     <NA>         none
 ## 7           161Dy     <NA>         none
 ## 8           162Dy     <NA>         none
 ## 9           163Dy     <NA>         none
 ## 10 164Dy_Siglec-8 Siglec-8        state
 ## 11          166Er     <NA>         none
 ## 12          167Er     <NA>         none
 ## 13    168Er_CD206    CD206        state
 ## 14    170Er_CD169    CD169        state
 ## 15    151Eu_CD11b    CD11b         type
 ## 16          153Eu     <NA>         none
 ## 17          155Gd     <NA>         none
 ## 18   156Gd_HLA-DR   HLA-DR        state
 ## 19          157Gd     <NA>         none
 ## 20          158Gd     <NA>         none
 ## 21     160Gd_CD14     CD14         type
 ## 22     165Ho_CD64     CD64        state
 ## 23           127I     <NA>         none
 ## 24          113In     <NA>         none
 ## 25          115In     <NA>         none
 ## 26          191Ir     <NA>         none
 ## 27          193Ir     <NA>         none
 ## 28          139La     <NA>         none
 ## 29          175Lu     <NA>         none
 ## 30    142Nd_CD11c    CD11c        state
 ## 31     143Nd_CD68     CD68        state
 ## 32          144Nd     <NA>         none
 ## 33     145Nd_CD71     CD71        state
 ## 34    146Nd_F4-80    F4-80        state
 ## 35          148Nd     <NA>         none
 ## 36          150Nd     <NA>         none
 ## 37          208Pb     <NA>         none
 ## 38      102Pd_MCB     <NA>         none
 ## 39      104Pd_MCB     <NA>         none
 ## 40      105Pd_MCB     <NA>         none
 ## 41      106Pd_MCB     <NA>         none
 ## 42      108Pd_MCB     <NA>         none
 ## 43      110Pd_MCB     <NA>         none
 ## 44          141Pr     <NA>         none
 ## 45    194Pt_CisPt     <NA>         none
 ## 46          195Pt     <NA>         none
 ## 47          198Pt     <NA>         none
 ## 48          103Rh     <NA>         none
 ## 49          147Sm     <NA>         none
 ## 50          149Sm     <NA>         none
 ## 51          152Sm     <NA>         none
 ## 52    154Sm_PD-L1    PD-L1        state
 ## 53          120Sn     <NA>         none
 ## 54          159Tb     <NA>         none
 ## 55    169Tm_CD163    CD163        state
 ## 56          131Xe     <NA>         none
 ## 57            89Y     <NA>         none
 ## 58     171Yb_CD86     CD86        state
 ## 59          172Yb     <NA>         none
 ## 60     173Yb_CD81     CD81        state
 ## 61     174Yb_CD88     CD88        state
 ## 62          176Yb     <NA>         none

(sce <- prepData(fcs_raw, panel, md))

## Error in prepData(fcs_raw, panel, md) : panel[[panel_cols$channel]] %in% colnames(fs) are not all TRUE

I read in the vignette of SCE that one can set marker_class to none. is it correct ?

Also, i read that by default, non-mass channels (e.g., time, event lengths) will be removed from the output SCE's assay data.

Do you think the error appears due the channels which doesn't have data ?

Thanks for your help,

Sudip

HelenaLC commented 4 years ago

Here's what I'd try:

check that all(panel$fcs_colname %in% colnames(fs)): If this returnsFALSE, there's a mistake in your metadata; if it returnsTRUE, see next point.
I can image all the NAs in the antigencolumns causing unexpected behavior. You should either
1. set all these to the same as the fcs_column (if there are no protein targets for these) or
2. if they are not used and needed for downstream analysis, simply remove them from the metadata table. Then, prepData will also remove them from the SCE, and you won't carry 10+ empty channels through the analysis.

(alternatively, you could specify prepData(sce, ..., features = panel$fcs_colnames[!is.na(panel$antigen)]); which would be equivalent to ii.)

sudu87 commented 4 years ago

Thanks, so i tried the first step:

all(panel$fcs_colname %in% colnames(fcs_raw)) returned FALSE

HelenaLC commented 4 years ago

Then there's your answer... you can use panel$fcs_colname[!panel$fcs_colname] %in% colnames(fcs_raw) to see which channels in your metadata table do not appear in the flowSet.

Nevertheless, I still recommend either removing channels that are not needed for the downstream analysis (because they are empty), or specifying which channels to keep via argument features.

sudu87 commented 4 years ago

I used the code you wrote. But I get this error. Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function '%in%': invalid argument type

Also if I try to use features with let's say 1 channel only just for trying it out, there's an error.

sce <- prepData(fcs_raw, panel, md,features ="164Dy_Siglec-8")

Error in prepData(fcs_raw, panel, md, features = "164Dy_Siglec-8") : panel[[panel_cols$channel]] %in% colnames(fs) are not all TRUE

HelenaLC commented 4 years ago

Yes, there was a typo. It should be panel$fcs_colname[!panel$fcs_colname %in% colnames(fcs_raw)] (there was a misplaced ]).
Sorry for only seeing this now- but you can already see the mistake from your first post. Your fcs_raw has column names of the form Dy162Di Dy163Di Dy164Di ..., but your md$fcs_colname has 164Dy_Siglec-8: this does not match with the channel names in the data. panel$fcs_colname[!panel$fcs_colname %in% colnames(fcs_raw) should give you which fcs_colnames need fixing.

Once you have the SCE, everything downstream should be much simpler, promise!

sudu87 commented 4 years ago

Thanks for pointing out the mistake and the unmatched column names. Totally overlooked that one! In this regard I have follow up questions:

Do you suggested removing those channels? if yes, then I don't know how to remove channels using your pipeline.
Can i just change the correct names of interest for e.g. Dy164Di instead of 164Dy_Siglec-8 and so on, in the panel file and the SCE will be generated using whatever matches ?
Can I just change the correct names in panel and pass it to the features argument? If so then in which format ? as "string" or just the names. i.e. features="Dy164Di" or features=Dy164Di. I tried to do this but i still get this error:

Error in prepData(fcs_raw, panel, md) : panel[[panel_cols$channel]] %in% colnames(fs) are not all TRUE

HelenaLC commented 4 years ago

I suggest removing them from the md table (.csv file). Then, they will be dropped from the SCE by prepData() automatically.
Kind of. Fixing the names will get rid of the error, but prepData() will not "use whatever matches", but will throw an error if anything doesn't match.
Yes, as a string. Please see the function documentation for argument features:
"a logical vector, numeric vector of column indices, or character vector of channel names. Specified which column to keep from the input data. Defaults to the channels listed in the input panel."
The error will persist if there are any channels in panel that don't match with fcs_raw, as I said in point 2. Regardless of argument features.

So in summary, specifying a subset of channels via features = ... or just keeping that subset in panel & removing all else is equivalent. Either way, the panel$fcs_colnames need to match with colnames(fcs_raw).

sudu87 commented 4 years ago

Thank you Helena, Point no. 1 worked. I removed the channels from the panel dataframe.

fcs_colname	antigen	marker_class
Dy164Di	Siglec-8	state
Er168Di	CD206	state
Er170Di	CD169	state
Eu151Di	CD11b	type
Gd156Di	HLA-DR	state
Gd160Di	CD14	type

Table properties:

'data.frame':   21 obs. of  3 variables:
 $ fcs_colname : chr  "Dy164Di" "Er168Di" "Er170Di" "Eu151Di" ...
 $ antigen     : chr  "Siglec-8" "CD206" "CD169" "CD11b" ...
 $ marker_class: chr  "state" "state" "state" "type" ...

Then the function: sce <- prepData(fcs_raw, panel, md) and as you said it discarded the non-matching columns.

The object was successfully made, this is how it looks now. Do you think this is good to go ? There's a warning though. Not sure why but doesn't seem have anymore errors.

class: SingleCellExperiment 
dim: 21 578655 
metadata(1): experiment_info
assays(2): counts exprs
rownames(21): Siglec-8 CD206 ... CD81 CD88
rowData names(3): channel_name marker_name marker_class
colnames: NULL
colData names(3): sample_id condition patient_id
reducedDimNames(0):
altExpNames(0):
Warning messages:
Unknown or uninitialised column: `Antigen`.

HelenaLC commented 4 years ago

Not sure about this warning either :/ Worth checking that rowData(sce) looks good- but otherwise, yes, I think you're good to go!

HelenaLC / CATALYST