Closed HantaoShu closed 2 years ago
If you already have the PCA computed, you shouldn't be calling Schema with NMF decomposition enabled-- instead you just want the mode='affine' argument. Pls see https://schema-multimodal.readthedocs.io/en/latest/api/index.html
Something like this maybe? SchemaQP(min_desired_corr=0.9, mode='affine', params={})
Thank you for your response. Could you kindly answer the other two questions, especially how to choose iroot for datasets used in the paper?
Hi Hantao, Thank you for your questions. 1) For choosing the root cell, we identified the cell with the highest expression of a marker gene corresponding to the cell type/condition that is expected biologically to lie at the beginning of the trajectory/biological process (e.g., Dlx3 in TAC progenitors for SHARE-seq). 2) We used len(rna_idx) to determine the number of genes to consider in the model, as some genes in a dataset may be too sparsely represented. In our use case, "rna_idx" represents a list of indices for just the set of genes which we want to include and analyze in our model.
Thanks for your response!
Hi Alexander P. Wu, I have some questions about reproducing results on the ICLR paper. 1), how to choose the iroot for each dataset in the construct_dag function. 2) File model.py Line:20-21 Should it be max(rna_idx) +1 instead of len(rna_idx)? The max number of rna_idx can be larger than the length of rna_idx. 3) As introduced in the APPENDIX A1 of the paper, PCAs are used for both ATAC-seq and RNA-seq during the joint embedding generation. But I found the following error:
Input: rna = sc.read_h5ad('../process_data/A549/RNA0_filter_0.05_counts.h5ad') atac = sc.read_h5ad('../process_data/A549/ATAC0_filter_0.001_counts.h5ad') sc.pp.normalize_per_cell(rna,100000) sc.pp.normalize_per_cell(atac,10000) sc.pp.log1p(rna) sc.pp.log1p(atac) sc.pp.pca(rna,n_comps=100) sc.pp.pca(atac,n_comps=100) from scipy.stats import spearmanr rna_pca,atac_pca = [],[] for i in range(100): if (spearmanr(rna.obsm['X_pca'][:,i],rna.X.toarray().sum(1))[0])<0.9: rna_pca.append(i) for i in range(100): if (spearmanr(atac.obsm['X_pca'][:,i],atac.X.toarray().sum(1))[0])<0.9: atac_pca.append(i) sqp = schema.SchemaQP( min_desired_corr=0.9,params= {'decomposition_model': 'nmf', 'num_top_components': 20} ) mod_X = sqp.fit_transform( rna.obsm['X_pca'][:,rna_pca],[atac.obsm['X_pca'][:,atac_pca]], [ 'feature_vector' ] )
Output:
ValueError Traceback (most recent call last)