Closed Syksy closed 3 years ago
I’ll do that
Cheers
F
On 19 May 2021, at 21:07, T. D. Laajala @.***> wrote:
Part of gex in Friedrich don't have gene names but instead ENSG#######
head(grep("ENSG00", rownames(mae_friedrich[["gex"]]), value=TRUE)) [1] "ENSG00000083622" "ENSG00000115934" "ENSG00000121388" "ENSG00000124593" "ENSG00000124835" "ENSG00000132832" length(grep("ENSG00", rownames(mae_friedrich[["gex"]]), value=TRUE)) [1] 7241
Need to homogenize them to be hugo symbols instead all the way.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
-- Federico Calboli @.***
Ok, I did rerun the data thinking I did make a mistake, and I did re-push the friedrich gex, but it is an issue in the curatedPCaData_genes
curatedPCaData_genes[which(curatedPCaData_genes[,1] == 'ENSG00000083622'), ] ensembl_gene_id ensembl_transcript_id hgnc_symbol refseq_mrna 218329 ENSG00000083622 ENST00000456270 chromosome_name start_position end_position 218329 7 117604791 117647415 description 218329 novel transcript, antisense to CFTR
these 6 ENSMBL gene IDs do not have a hgnc_symbol in the table.
Cheers
Federico
On 19 May 2021, at 21:07, T. D. Laajala @.***> wrote:
Part of gex in Friedrich don't have gene names but instead ENSG#######
head(grep("ENSG00", rownames(mae_friedrich[["gex"]]), value=TRUE)) [1] "ENSG00000083622" "ENSG00000115934" "ENSG00000121388" "ENSG00000124593" "ENSG00000124835" "ENSG00000132832" length(grep("ENSG00", rownames(mae_friedrich[["gex"]]), value=TRUE)) [1] 7241
Need to homogenize them to be hugo symbols instead all the way.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
-- Federico Calboli @.***
Thanks - it's about differences in gene annotations and how they're structured in various databases, the ENSG-genes or ENST-transcripts don't have a 1:1 mapping to Hugo gene symbols, so in case of ambiguity or missing symbols, we'll most likely have to try either collapse multiple instances or omit genes without a gene symbol.
Friedrich et al. has now been processed from raw data in v0.6.21 using limma pipeline for Agilent one-color arrays in generate.R, including the mapping to hugo symbols and collapsing probes while removing rows without hugo symbols.
Part of gex in Friedrich don't have gene names but instead ENSG#######
Need to homogenize them to be hugo symbols instead all the way.