Durenlab / LINGER

78 stars 17 forks source link

Seurat dataset for LINGER #16

Open feanaros opened 7 months ago

feanaros commented 7 months ago

Hi, I have a multiomic dataset scRNA+scATAC built on Seurat. My obj Is saved as .RDS file but I can also store It on h5ad file. How can I use LINGER starting from It? Thank you

amssyqy commented 7 months ago

Hi, I have a multiomic dataset scRNA+scATAC built on Seurat. My obj Is saved as .RDS file but I can also store It on h5ad file. How can I use LINGER starting from It? Thank you

Thanks for the question~ We added it to the PBMCs tutorial taking the h5ad file as input in the 'Prepare the input data' section. Please check https://github.com/Durenlab/LINGER/blob/main/docs/PBMC.md. Thanks for your time~

feanaros commented 7 months ago

Thank you very much. Because Im not familiAr with python (I work on Seurat in R), could you please indicate to me how to add my cluster labels (from metadata column) and dimensionality reduction (umap coords)?

amssyqy commented 6 months ago

Thank you very much. Because Im not familiAr with python (I work on Seurat in R), could you please indicate to me how to add my cluster labels (from metadata column) and dimensionality reduction (umap coords)?

No problem! You can try the following code to get the label file.

label=data.frame(colnames(Seurat_obj))
colnames(label)=c('barcode_use')
label$label=Seurat_obj@meta.data$cell_type
write.table(label,'label.txt',sep='\t',row.names = FALSE,quote=FALSE)

Best~

feanaros commented 6 months ago

thank you very much for your help! best

when I import the h5ad matrix, adata object looks like this:

AnnData object with n_obs × n_vars = 9995 × 32285
    obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'nCount_ATAC', 'nFeature_ATAC', 'nCount_motif', 'nFeature_motif', 'TSS.enrichment', 'TSS.percentile', 'nucleosome_signal', 'nucleosome_percentile', 'blacklist_fraction', 'dataset', 'integrated_rna.weight', 'ATAC.weight', 'wsnn_res.0.1', 'wsnn_res.0.2', 'wsnn_res.0.5', 'wsnn_res.1', 'wsnn_res.1.5', 'treatment', 'timepoint', 'treatment_timepoint', 'RNA_snn_res.0.2', 'seurat_clusters', 'RNA_snn_res.0.3', 'RNA_snn_res.0.4', 'RNA_snn_res.0.5', 'seurat_cluster_0.4', 'treatment_timepoint_cluster', 'Subset_SHF', 'pseudotime', 'timepoint_cluster', 'pseudotime_rank'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'

but from you tutorial it should be a simple matrix `n_obs x n_vars` where obs are barcodes and vars are gene_id.
I used the following script to save my object, but I'm not able to extract and save only the matrices.
DefaultAssay(Obj)<- "RNA"
RNA= DietSeurat(Obj, assays = "RNA",     counts = TRUE,
                data = TRUE,
                scale.data = FALSE,
                features = NULL,
                dimreducs = NULL,
                graphs = NULL,
                misc = F)

SaveH5Seurat(RNA, filename = "/home/h5ad_files/Obj_RNA.h5Seurat", overwrite = T)
Convert("/home/h5ad_files/Obj_RNA.h5Seurat", dest = "h5ad")

DefaultAssay(Obj)<- "ATAC"
ATAC= DietSeurat(Obj, assays = "ATAC",     counts = TRUE,
                data = TRUE,
                scale.data = FALSE,
                features = NULL,
                dimreducs = NULL,
                graphs = NULL,
                misc = F)

SaveH5Seurat(ATAC, filename = "/home/h5ad_files/Obj_ATAC.h5Seurat", overwrite = T)
Convert("/home/h5ad_files/Obj_ATAC.h5Seurat", dest = "h5ad")

AM I doing it wrong?

amssyqy commented 6 months ago

thank you very much for your help! best another and I hope -last- question: when I import the h5ad matrix, adata object looks like this:

AnnData object with n_obs × n_vars = 9995 × 32285
    obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'nCount_ATAC', 'nFeature_ATAC', 'nCount_motif', 'nFeature_motif', 'TSS.enrichment', 'TSS.percentile', 'nucleosome_signal', 'nucleosome_percentile', 'blacklist_fraction', 'dataset', 'integrated_rna.weight', 'ATAC.weight', 'wsnn_res.0.1', 'wsnn_res.0.2', 'wsnn_res.0.5', 'wsnn_res.1', 'wsnn_res.1.5', 'treatment', 'timepoint', 'treatment_timepoint', 'RNA_snn_res.0.2', 'seurat_clusters', 'RNA_snn_res.0.3', 'RNA_snn_res.0.4', 'RNA_snn_res.0.5', 'seurat_cluster_0.4', 'treatment_timepoint_cluster', 'Subset_SHF', 'pseudotime', 'timepoint_cluster', 'pseudotime_rank'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'

but from you tutorial it should be a simple matrix `n_obs x n_vars` where obs are barcodes and vars are gene_id.
I used the following script to save my object, but I'm not able to extract and save only the matrices.
DefaultAssay(Obj)<- "RNA"
RNA= DietSeurat(Obj, assays = "RNA",     counts = TRUE,
                data = TRUE,
                scale.data = FALSE,
                features = NULL,
                dimreducs = NULL,
                graphs = NULL,
                misc = F)

SaveH5Seurat(RNA, filename = "/home/h5ad_files/Obj_RNA.h5Seurat", overwrite = T)
Convert("/home/h5ad_files/Obj_RNA.h5Seurat", dest = "h5ad")

DefaultAssay(Obj)<- "ATAC"
ATAC= DietSeurat(Obj, assays = "ATAC",     counts = TRUE,
                data = TRUE,
                scale.data = FALSE,
                features = NULL,
                dimreducs = NULL,
                graphs = NULL,
                misc = F)

SaveH5Seurat(ATAC, filename = "/home/h5ad_files/Obj_ATAC.h5Seurat", overwrite = T)
Convert("/home/h5ad_files/Obj_ATAC.h5Seurat", dest = "h5ad")

AM I doing it wrong?

There is no problem with saving your object in this way. You can try

adata_RNA.obs['barcode']=adata_RNA.obs.index
adata_RNA.var['gene_ids']=adata_RNA.var.index
adata_ATAC.obs['barcode']=adata_ATAC.obs.index
adata_ATAC.var['gene_ids']=adata_ATAC.var.index

Then you can run the following python code and see whether it would work

import scipy.sparse as sp
matrix=sp.vstack([adata_RNA.X.T, adata_ATAC.X.T])
features=pd.DataFrame(adata_RNA.var['gene_ids'].values.tolist()+adata_ATAC.var['gene_ids'].values.tolist(),columns=[1])
K=adata_RNA.shape[1]
N=K+adata_ATAC.shape[1]
types = ['Gene Expression' if i <= K else 'Peaks' for i in range(0, N)]
features[2]=types
barcodes=pd.DataFrame(adata_RNA.obs['barcode'].values,columns=[0])
from LingerGRN.preprocess import *
adata_RNA,adata_ATAC=get_adata(matrix,features,barcodes,label)# adata_RNA and adata_ATAC are scRNA and scATAC
feanaros commented 6 months ago

ok! It also worked for me using: obs= colnames(Obj) vars= rownames(Obj) write.table(obs,'obs.txt',sep='\t',row.names = FALSE,quote=FALSE) write.table(obs,'obs.txt',sep='\t',row.names = FALSE,quote=FALSE) write.table(vars,'vars.txt',sep='\t',row.names = FALSE,quote=FALSE) import anndata as ad import numpy as np

matrix_rna = adata_RNA.X data_RNA = ad.AnnData(X=matrix_rna)

data_RNA.obs['barcodes'] = obs data_RNA.var['gene_id'] = vars

and same for Atac ^_^ Thank again!