Ensembl ID to Gene ID - Githubissues

AlexsLemonade / scpca-docs

User information about ScPCA processing

https://scpca.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

0 stars 1 forks source link

Ensembl ID to Gene ID #304

Closed chrkuo closed 2 months ago

chrkuo commented 5 months ago

after following the FAQ section for converting SCE to seurat object when I start the downstream analysis i realized the genes are ensembl IDs instead of actual gene names how do i convert the ensembl IDs to gene IDs?

allyhawkins commented 5 months ago

Hi @chrkuo, the row names of the objects will be Ensembl IDs, but we do provide the mapped gene symbol in the rowData of the SingleCellExperiment object. If you followed the instructions for converting the Seurat object in the FAQ, then you should also have stored the original rowData in the seurat_object[["RNA"]]@meta.features slot. That part of the Seurat object will contain a data frame where each row is a gene and columns contain information about that gene. The gene_symbol column will contain the mapped gene symbol that you are looking for.

For more information on what is stored in the SingleCellExperiment objects, see Components of a SingleCellExperiment object.

chrkuo commented 5 months ago

@allyhawkins thank you! i see that now. i terms of downstream analysis is there a way to completely replace the rowdata so that it's just gene_symbols instead of ensembl ID?

chrkuo commented 5 months ago

@allyhawkins

this is what I did and hopefully it's right:

gene_symbols <- seurat_object[["RNA"]]@meta.features$gene_symbol

gene_symbols <- make.unique(gene_symbols)

rownames(seurat_object[["RNA"]]@counts) <- gene_symbols

allyhawkins commented 5 months ago

That is one way that you could do it. Theoretically, the order of the row names should be preserved, but to be sure that you are adding the correct gene symbols to the correct rows, you can also modify the gene metadata and then replace the existing metadata with the modified metadata. I believe doing this should also change the row names of the Seurat object.

gene_metadata_df <- seurat[["RNA"]]@meta.features |>
  # make sure all symbols are unique
  dplyr::mutate(gene_symbol = make.unique(gene_symbol)) |> 
  # replace the rownames with gene_symbol column 
  tibble::column_to_rownames("gene_symbol")

# replace existing gene metadata with new metadata that contains gene symbols as rows
seurat[["RNA"]]@meta.features <- gene_metadata_df

To be sure that the rownames match you can always do rownames(seurat) <- rownames(gene_metadata_df). But again, you're solution should also work, this one is just more robust to any potential changes in the order of the genes.

jaclyn-taroni commented 2 months ago

I'm marking this as resolved.