bhklab / genefu

R package providing various functions relevant for gene expression analysis with emphasis on breast cancer.
25 stars 13 forks source link

ggi() and gene70() commands input files #18

Closed matteo95serra closed 2 years ago

matteo95serra commented 3 years ago

Hello,

I'm working on different bulk RNA-seq dataset and I've tried to compute GENE70, GGI and PAM50 classifications with "genefu", but I was only able to obtain the PAM50 classification. Both GENE70 and GGI there was no way to make them work. If I've understood well, the commands to compute PAM50, GGI and GENE70 scores are (respectively) the following:

where "matrix" is my expression matrix with rownames as sample names and colnames as gene names (in my case NCBI gene symbols), and "annotation" is a dataframe with a column containing the NCBI gene symbols and a column containing the respective EntrezGene.ID.

As I've said, the PAM50 classification works, but the other two commands no.

In particular, "ggi" command runs but I guess it's not able to map any of the genes (all the GGI scorse are "NA"). If I put "do.mapping = T", I obtain a error saying: "Error in data1[, gg.uniq, drop = FALSE] : subscript out of bounds").

For "gene70", if I don't specify "do.mapping = T" I obtain the error: "Error in gene70(data = t(as.matrix(visium_brain@norm_expr)), annot = sig.ggi) : object 'res' not found In addition: Warning message: In gene70(data = t(as.matrix(visium_brain@norm_expr)), annot = sig.ggi) : No overalp between the gene signature EntrezGene.IDsand the colnames of your data... Returning all NAs." If I put "do.mapping = T", I obtain the error: "Error in data1[, gg.uniq, drop = FALSE] : subscript out of bounds".

The "matrix" and the "annotation" that I've used for PAM50 are exactly the same as the ones used for ggi and gene70.

Could someone help me to solve this issue? I guess there could be something wrong in the "annotation" file, but the weird thing is that it works well with the PAM50 command.

Thank you in advance

ChristopherEeles commented 2 years ago

Hi @matteo95serra,

The genefu package was designed for use with Affymetrix microarray data, so adapting it for RNA-sequencing data may not be straight forward. See #22 for more information on this.

I suspected that your errors are due to mismatching between your feature names and those of the corresponding gene signature object. The centroid genes are labelled with the gene symbol from their Affymetrix probe annotations, and as such may be outdated. When you set do.mapping=TRUE, the gene labels should be Entrez Gene id, and probe gene symbol will be mapped using that.

You can check that your expression matrix has the correct features for a given signature by loading that signature and matching the feature names. For example, with the ggi function:

data(sig.ggi)
# Assuming Entrez IDs
matching_genes <- intersect(colnames(your_matrix), sig.ggi$centroids.map$EntrezGene.ID)

For the functions to work, you need at least a few features to match.

To answer data specific questions, I need the code to reproduce some of your data so I can debug the functions and see what is going wrong. You can provide this to me here by replying with the output from:

dput(head(your_matrix))

Best, Christopher Eeles Software Developer BHK Lab | PM-Research | UHN

ChristopherEeles commented 2 years ago

Hi @matteo95serra,

I am closing this issue due to inactivity. If you have further questions feel free to re-open it.

Best, Chris