gabrielodom / pathwayPCA

integrative pathway analysis with modern PCA methodology and gene selection
https://gabrielodom.github.io/pathwayPCA/
11 stars 2 forks source link

CreateOmics not reading all features of assay data #90

Closed cfrasier closed 2 years ago

cfrasier commented 4 years ago

I'm currently trying to create an Omics object using assay (gene expression) and phenotype (survival) data stored in a tidy dataframe. The assay data has around 38000 features (genes), yet the omics object is only recognizing 29 of those features. I am using the wikipathways pathway collection and I have made sure that many of the genes that are not being recognized are part of those pathways. I can find no differences in the assay dataframe to suggest why some genes would be recognized and others would not. Perhaps you can help share some insight?

The code used to create the Omics object image

The output from the script detailing the creation of the Omics object image

A 5x5 tibble of the assay dataframe image

Thanks, Connor Frasier

gabrielodom commented 4 years ago

Hi Connor, Please forgive me. I somehow missed this notification. The diagnostic message states that your allData object only has 29 features. Is it possible that some of your columns are nested? Could you show me the output of colnames(allData) and str(allData)?

cfrasier commented 4 years ago

Hey, no worries, a late response is better than no response. So this is a small snapshot of colnames(allData): image

And here is a snapshot of str(allData): image

I also wrote the output to a file for you to peruse. PathwayPCA.output.txt

gabrielodom commented 4 years ago

Ok, I have no earthly idea then. Do you have time for a Zoom call or something?

cfrasier commented 4 years ago

Yeah, absolutely. You can send the invite to my email: cfrasier@uncc.edu. I'm busy this afternoon, but I will be free all day tomorrow.

gabrielodom commented 3 years ago

Ok, so we found that the diagnostic messages of the CreateOmics() function has its header removed if there are more than a few thousand genes printed via either "improper gene names" or "variance of genes = 0" messages. I should add a catch that prevents more than 1000 genes being printed by the message.