Closed yanwu2014 closed 4 years ago
Hi Yan,
While the vignette demonstrates the use of phemd with a Monocle2 object, there is the option of using phemd with either a Seurat object, Monocle2 object, or PHATE object. In brief, for Seurat, I would perform your desired dimensionality reduction and clustering (ex. RunPCA, FindNeighbors, FindClusters) and then apply PhEMD using the following sequential functions: createDataObj -> bindSeuratObj -> clusterIndividualSamples -> generateGDM -> compareSamples -> groupSamples -> plotGroupedSamplesDmap. In general, this Seurat workflow is the same as the Monocle2 (and PHATE) workflow with many of the functions accepting (cell_model='seurat') as a parameter.
I will get a vignette of this up and running ASAP but in the meantime, feel free to email me for specific questions.
Best, Will
Hi!
Thanks, I think I was missing the bindSeuratObj
function. I'm getting this error when I try to run clusterIndividualSamples
though. Is this meant for Seurat 2.0 or 3.0 objects?
Hi Yan,
The code was updated fairly recently and is now intended for Seurat 3.0 objects. If you are using a Seurat 3.0 object and are still having trouble, I am happy to help troubleshoot your specific use case (feel free to email me!)
Thanks I'll take a look!
Hi, I am having a similar problem to yanwu2014. I am using phemd version 1.1.1 which I just installed from bioconductor yesterday. I was able to successfully run the whole analysis workflow using the monocle2 method, but since the rest of my single cell analysis is working on a Seurat object I would prefer to do it this way for consistency.
Anyways here is my code:
Idents(NucSeq.joint) <- as.numeric(as.factor(NucSeq.joint$Cell.Type))
sampled.cells <- sample(colnames(NucSeq.joint), 20000)
NucSeq.sampled <- NucSeq.joint[,sampled.cells]
# make a list of expression data for each sample:
expression_matrix <- GetAssayData(NucSeq.sampled, slot='data')[VariableFeatures(NucSeq.sampled),]
expression_list <- list()
for(sample in unique(NucSeq.sampled$SampleID)){
print(sample)
expression_list[[sample]] <- as.matrix(t(expression_matrix[,NucSeq.sampled$SampleID == sample]))
}
# create phemd obj
phemd_obj <- createDataObj(expression_list, VariableFeatures(NucSeq.sampled), names(expression_list))
phemd_obj <- bindSeuratObj(phemd_obj, NucSeq.sampled, 'SampleID')
phemd_obj <- clusterIndividualSamples(phemd_obj, verbose=TRUE, cell_model='seurat')
And here is the error:
Warning: The following arguments are not used: assay.type
[1] 0.0589 0.3104 0.1151 0.0475 0.4185 0.0440 0.0056
Warning: The following arguments are not used: assay.type
Error in clusterIndividualSamples(phemd_obj, verbose = TRUE, cell_model = "seurat") :
Error: no cells in reference set match the experiment_id NA of sample 1
Maybe I should try the version of phemd from github rather than from bioconductor?
@yanwu2014 @smorabit I have fixed the package to be compatible with the latest version of Seurat and have pushed the changes to Bioconductor. However, it seems the changes may take a day or two to update so the best way to install the package right now would be via the Github package:
library(devtools) install_github('KrishnaswamyLab/phemd')
The additional step that is different than the Monocle pipeline is to assign a batch ID to each sample, even if all samples are from the same batch (i.e. expression values are batch corrected or should be treated as such). For example, if NucSeq.sampled was a Seurat object containing all cells comprising your reference map of cell subtypes, initialized via CreateSeuratObject(expn_data_subsampled, project='Batch1') and then having run PCA/UMAP/TSNE etc., your pipeline using the UMAP cell state embedding would be as follows:
phemd_obj <- createDataObj(expression_list, VariableFeatures(NucSeq.sampled), names(expression_list)) phemd_obj <- bindSeuratObj(phemd_obj, NucSeq.sampled) batchIDs(phemd_obj) <- 'Batch1' phemd_obj <- clusterIndividualSamples(phemd_obj, cell_model='seurat') phemd_obj <- generateGDM(phemd_obj, cell_model='seurat', expn_type='umap', ndim=2)
emd_distmat <- compareSamples(phemd_obj) group_assignments <- groupSamples(emd_distmat, distfun='hclust', ncluster=5) phemd_dmap <- plotGroupedSamplesDmap(emd_distmat, group_assignments, pt_sz=1.5)
Let me know if you have any additional issues.
I re-installed phemd from github as you suggested, and I am able to run it using my Seurat object. Thanks for the help!
Hi, sorry to reopen this thread again, but I hope someone might be able to help me.
I am trying to run this pipeline on my Seurat object, and I encounter this error when running clusterIndividualSamples.
[1] 1 Error in RANN::nn2(data = ref_cells, query = cur_cells, k = 1) : Query and data tables must have same dimensions
May I know what this error could mean please? Am I making a mistake in the code somewhere? I have followed all of the steps described above, and everything before this line runs ok.
Thank you so much!
Samuel
Hi! I've got a PerturbSeq style gene knockout screen with scRNA-seq readout and I was wondering if it was possible to run phemd with either just a Seurat object or with custom embeddings and clusters. Right now it looks like it requires a Monocle2 object
Thanks! Yan