Using phemd with a Seurat object only

yanwu2014 commented 4 years ago

Hi! I've got a PerturbSeq style gene knockout screen with scRNA-seq readout and I was wondering if it was possible to run phemd with either just a Seurat object or with custom embeddings and clusters. Right now it looks like it requires a Monocle2 object

Thanks! Yan

wschen commented 4 years ago

Hi Yan,

While the vignette demonstrates the use of phemd with a Monocle2 object, there is the option of using phemd with either a Seurat object, Monocle2 object, or PHATE object. In brief, for Seurat, I would perform your desired dimensionality reduction and clustering (ex. RunPCA, FindNeighbors, FindClusters) and then apply PhEMD using the following sequential functions: createDataObj -> bindSeuratObj -> clusterIndividualSamples -> generateGDM -> compareSamples -> groupSamples -> plotGroupedSamplesDmap. In general, this Seurat workflow is the same as the Monocle2 (and PHATE) workflow with many of the functions accepting (cell_model='seurat') as a parameter.

I will get a vignette of this up and running ASAP but in the meantime, feel free to email me for specific questions.

Best, Will

yanwu2014 commented 4 years ago

Hi!

Thanks, I think I was missing the bindSeuratObj function. I'm getting this error when I try to run clusterIndividualSamples though. Is this meant for Seurat 2.0 or 3.0 objects?

wschen commented 4 years ago

Hi Yan,

The code was updated fairly recently and is now intended for Seurat 3.0 objects. If you are using a Seurat 3.0 object and are still having trouble, I am happy to help troubleshoot your specific use case (feel free to email me!)

Will

yanwu2014 commented 4 years ago

Thanks I'll take a look!

smorabit commented 4 years ago

Hi, I am having a similar problem to yanwu2014. I am using phemd version 1.1.1 which I just installed from bioconductor yesterday. I was able to successfully run the whole analysis workflow using the monocle2 method, but since the rest of my single cell analysis is working on a Seurat object I would prefer to do it this way for consistency.

Anyways here is my code:

Idents(NucSeq.joint) <- as.numeric(as.factor(NucSeq.joint$Cell.Type))

sampled.cells <- sample(colnames(NucSeq.joint), 20000)
NucSeq.sampled <- NucSeq.joint[,sampled.cells]

# make a list of expression data for each sample: 
expression_matrix <- GetAssayData(NucSeq.sampled, slot='data')[VariableFeatures(NucSeq.sampled),]

expression_list <- list()
for(sample in unique(NucSeq.sampled$SampleID)){
  print(sample)
  expression_list[[sample]] <- as.matrix(t(expression_matrix[,NucSeq.sampled$SampleID == sample]))
}

# create phemd obj
phemd_obj <- createDataObj(expression_list, VariableFeatures(NucSeq.sampled), names(expression_list))
phemd_obj <- bindSeuratObj(phemd_obj, NucSeq.sampled, 'SampleID')
phemd_obj <- clusterIndividualSamples(phemd_obj, verbose=TRUE, cell_model='seurat')

And here is the error:

Warning: The following arguments are not used: assay.type
[1] 0.0589 0.3104 0.1151 0.0475 0.4185 0.0440 0.0056
Warning: The following arguments are not used: assay.type

Error in clusterIndividualSamples(phemd_obj, verbose = TRUE, cell_model = "seurat") :
  Error: no cells in reference set match the experiment_id NA of sample 1

Maybe I should try the version of phemd from github rather than from bioconductor?

wschen commented 4 years ago

@yanwu2014 @smorabit I have fixed the package to be compatible with the latest version of Seurat and have pushed the changes to Bioconductor. However, it seems the changes may take a day or two to update so the best way to install the package right now would be via the Github package:

library(devtools) install_github('KrishnaswamyLab/phemd')

The additional step that is different than the Monocle pipeline is to assign a batch ID to each sample, even if all samples are from the same batch (i.e. expression values are batch corrected or should be treated as such). For example, if NucSeq.sampled was a Seurat object containing all cells comprising your reference map of cell subtypes, initialized via CreateSeuratObject(expn_data_subsampled, project='Batch1') and then having run PCA/UMAP/TSNE etc., your pipeline using the UMAP cell state embedding would be as follows:

phemd_obj <- createDataObj(expression_list, VariableFeatures(NucSeq.sampled), names(expression_list)) phemd_obj <- bindSeuratObj(phemd_obj, NucSeq.sampled) batchIDs(phemd_obj) <- 'Batch1' phemd_obj <- clusterIndividualSamples(phemd_obj, cell_model='seurat') phemd_obj <- generateGDM(phemd_obj, cell_model='seurat', expn_type='umap', ndim=2)

emd_distmat <- compareSamples(phemd_obj) group_assignments <- groupSamples(emd_distmat, distfun='hclust', ncluster=5) phemd_dmap <- plotGroupedSamplesDmap(emd_distmat, group_assignments, pt_sz=1.5)

Let me know if you have any additional issues.

smorabit commented 4 years ago

I re-installed phemd from github as you suggested, and I am able to run it using my Seurat object. Thanks for the help!

SamuelCWJ commented 11 months ago

Hi, sorry to reopen this thread again, but I hope someone might be able to help me.

I am trying to run this pipeline on my Seurat object, and I encounter this error when running clusterIndividualSamples.

[1] 1 Error in RANN::nn2(data = ref_cells, query = cur_cells, k = 1) : Query and data tables must have same dimensions

May I know what this error could mean please? Am I making a mistake in the code somewhere? I have followed all of the steps described above, and everything before this line runs ok.

Thank you so much!

Samuel

KrishnaswamyLab / phemd

Using phemd with a Seurat object only #1