bhklab / genefu

R package providing various functions relevant for gene expression analysis with emphasis on breast cancer.
25 stars 13 forks source link

molecular.subtyping error #29

Open elicabe opened 2 years ago

elicabe commented 2 years ago

Hello! I am trying to analyze a database with the package genefu, but I have a problem that I could not solve. I have prepare a matrix data with the gene expression (ddata) and a matrix with the annotations (dannot). When I run the function "molecular.subtyping" or "intrinsic.cluster.predict" I always have the same result:

Error in rep(NA, nrow(data)) : invalid 'times' argument In addition: Warning message: In geneid.map(geneid1 = gid, data1 = data, geneid2 = centroids.gid, : no gene ids in common!

I have checked if I have repeated data, samples or genes, and is still given the same problem. Could you help me with this question?

Thank you very much I advance

ChristopherEeles commented 2 years ago

Hi @elicabe,

Based on the included error it appears that none of the gene identifiers you have supplied to the function map to the selected molecular signature. Can you please provide the exact code you are running to get the error as well as you sessionInfo() so I can help debug further.

It would also be helpful if you can include the results from head(dannot) so I can see which identifier you have in your annotation file.

Best, Christopher Eeles Software Developer Haibe-Kains Lab | PM-Research University Health Network

elicabe commented 2 years ago

Hi ChristopherEeles,

thank you for your help. I write what you tell me:

1- The exact code you are running to get the error: SubtypePredictions <- molecular.subtyping(sbt.model = "pam50",data = ddata, annot = dannot,do.mapping = TRUE)

2- sessionInfo()

R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Mojave 10.14.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

3- head(dannot)

A tibble: 6 × 5

probe EntrezGene.ID NCBI.gene.symbol HUGO.gene.symbol probe.name

1 1007_s_at 780 DDR1 DDR1 1007_s_at 2 1053_at 5982 RFC2 RFC2 1053_at 3 117_at 3310 HSPA6 HSPA6 117_at 4 121_at 7849 PAX8 PAX8 121_at 5 1255_g_at 2978 GUCA1A GUCA1A 1255_g_at 6 1294_at 7318 UBA7 UBA7 1294_at Thank you very much in advance. Best
ChristopherEeles commented 2 years ago

Hi @elicabe,

In order to match the features in your data with the gene signatures included in genefu, either: (1) The rownames of your data match the rownames of the $centroids in the gene signature OR (2) You include an annotation with the EntrezGene.ID column and one or more of those gene ID match the EntrezGene.ID column of the signature

If these conditions aren't met, then there is no way to apply the gene signature in question to your data, since none of the relevant features can be identified.

Since you are using the pam50 signature in your code, you can check this by loading the data object and ensuring one of the two conditions are met:

library(genefu)
data(pam50.robust)

(any(rownames(ddata) %in% rownames(pam50.robust$centroids.map)))
(any(dannot$`EntrezGene.ID` %in% pam50.robust$centroids.map$EntrezGene.ID)) 

The same pattern can be used for other gene signature in the package. See help(package="genefu") for a list of available functions and signature.

Best, Christopher Eeles Software Developer Haibe-Kains Lab PM-Research | UHN