GuangyuWangLab2021 / cellDancer

Predict RNA velocity through deep learning
https://guangyuwanglab2021.github.io/cellDancer_website/
BSD 3-Clause "New" or "Revised" License
60 stars 11 forks source link

Transfer adata format to dataframe #4

Closed chen-peng-1874 closed 1 year ago

chen-peng-1874 commented 1 year ago

Dear GuangyuWangLab members,

Another question is about data transfer. I have generated adata for "scvelo" analysis, which has columns of "seurat_cluster" and U_map embeddings. As I noticed that the syntax in your celldancer website, there is a sentence for gene list, as shown below:

cdutil.adata_to_df_with_embed(adata,
                              us_para=['Mu','Ms'],
                              cell_type_para='celltype',
                              embed_para='X_umap',
                              save_path='cell_type_u_s.csv',
                              **_gene_list=['Hba-x','Smim1']_**)

What if I want all the genes listed? Thank you!

Abclisy commented 1 year ago

If in your adata, the 'seurat_cluster' is one column of adata.obs, 'X_umap' is one element of adata.obsm, you can use the codes below to transfer all genes.

cdutil.adata_to_df_with_embed(adata,
                              us_para=['Mu','Ms'],
                              cell_type_para='seurat_cluster',
                              embed_para='X_umap',
                              save_path='cell_type_u_s.csv')

Not inputting the gene_list will let adata_to_df_with_embed defaults get all genes.

chen-peng-1874 commented 1 year ago

If in your adata, the 'seurat_cluster' is one column of adata.obs, 'X_umap' is one element of adata.obsm, you can use the codes below to transfer all genes.

cdutil.adata_to_df_with_embed(adata,
                              us_para=['Mu','Ms'],
                              cell_type_para='seurat_cluster',
                              embed_para='X_umap',
                              save_path='cell_type_u_s.csv')

Not inputting the gene_list will let adata_to_df_with_embed defaults get all genes.

Thank you, Abclisy! I'll have a try.

zhijunyuu commented 1 year ago

Hey cellDancer group,

Thanks for the nice tool! I have the adata as follows:

AnnData object with n_obs × n_vars = 23496 × 18964
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'donor_id', 'prob_max', 'prob_doublet', 'n_vars', 'best_singlet', 'doublet_logLikRatio', 'nCount_SCT', 'nFeature_SCT', 'pANN', 'gender', 'location', 'donors', 'SCT_snn_res.0.5', 'seurat_clusters', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'RNA_snn_res.0.4', 'cell_types', 'batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size'
    var: 'name', 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    uns: 'cell_types_colors'
    obsm: 'X_harmony', 'X_pca', 'X_umap'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

and I was using the adata_to_df_with_embed function to transfer adata into dataframe with the code as below:

cdutil.adata_to_df_with_embed(adata,
                              us_para=['spliced','unspliced'],
                              cell_type_para='cell_types',
                              embed_para='X_umap',
                              save_path='cell_type_u_s.csv')

But I got an error: ValueError: If using all scalar values, you must pass an index Do you have any idea about this error? Thanks in advance!

Best, Zhijun

chen-peng-1874 commented 1 year ago

It seems that you don't have "Mu" and "Ms" in your obsm.

You can process the data in scvelo, and then use cdutil.

Hope that will help you.


From: zhijunyuu @.> Sent: Wednesday, April 26, 2023 2:10 AM To: GuangyuWangLab2021/cellDancer @.> Cc: Peng Chen @.>; State change @.> Subject: Re: [GuangyuWangLab2021/cellDancer] Transfer adata format to dataframe (Issue #4)

Hey cellDancer group,

Thanks for the nice tool! I have the adata as follows: AnnData object with n_obs × n_vars = 23496 × 18964 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'donor_id', 'prob_max', 'prob_doublet', 'n_vars', 'best_singlet', 'doublet_logLikRatio', 'nCount_SCT', 'nFeature_SCT', 'pANN', 'gender', 'location', 'donors', 'SCT_snn_res.0.5', 'seurat_clusters', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'RNA_snn_res.0.4', 'cell_types', 'batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size' var: 'name', 'Accession', 'Chromosome', 'End', 'Start', 'Strand' uns: 'cell_types_colors' obsm: 'X_harmony', 'X_pca', 'X_umap' layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'

and I was using the adata_to_df_with_embed function to transfer adata into dataframe with the code as below: cdutil.adata_to_df_with_embed(adata, us_para=['spliced','unspliced'], cell_type_para='cell_types', embed_para='X_umap', save_path='cell_type_u_s.csv')

But I got an error: ValueError: If using all scalar values, you must pass an index Do you have any idea about this error? Thanks in advance!

Best, Zhijun

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/GuangyuWangLab2021/cellDancer/issues/4*issuecomment-1523062185__;Iw!!LIr3w8kk_Xxm!tlQyveH09d3iHgZtQZjc-OcydSzmnYMfTHCW8LHGKKLcW1lccYwxt2U6JgIhC3sk2hvnPDxN8N3hrAa5CSzqMyw1aA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A5IW6RAANNXG7HRGCRBDB5LXDDRCBANCNFSM6AAAAAAXDKNFSM__;!!LIr3w8kk_Xxm!tlQyveH09d3iHgZtQZjc-OcydSzmnYMfTHCW8LHGKKLcW1lccYwxt2U6JgIhC3sk2hvnPDxN8N3hrAa5CSzQvRJF_A$. You are receiving this because you modified the open/close state.Message ID: @.***>

Abclisy commented 1 year ago

Hi zhijunyuu, the codes below will help you to get the 'Mu' and 'Ms' in your 'layers' of adata.

# !pip install scvelo --upgrade --quiet
import scvelo as scv
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

scv.pp.filter_and_normalize runs the following:

scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.normalize_per_cell(adata)
scv.pp.filter_genes_dispersion(adata, n_top_genes=2000)
scv.pp.log1p(adata)

For more information, Data Preparation might be helpful. We will add more details about preprocessing on our website later. If it still does not work, could you send us a small subset of your data to let us repeat and debug if possible? Thank you!