YuLab-SMU / DOSE

:mask: Disease Ontology Semantic and Enrichment analysis
https://yulab-smu.top/biomedical-knowledge-mining-book/
116 stars 36 forks source link

added keyoutput parameter to setReadable and modified EXTID2NAME func… #66

Open TriLab-bioinf opened 2 years ago

TriLab-bioinf commented 2 years ago

I have edited setReadable and EXTID2NAME functions so they now they accept an additional parameter, "keyoutput", to call an alternative OrgDb output column holding gene names/symbols rather than enforcing the use of 'SYMBOL' as the only option. This change was necessary given that other OrgDb databases such as yeast's org.Sc.sgd.db store gene names/symbols under the GENENAME column and lack a SYMBOL column, giving an error message when using enrichGO and groupGO functions from ClusterProfiler package or setReadable function from this package.

huerqiang commented 2 years ago

I don't think setReadable() needs to have the ability to convert to other gene ids. "readable" refers to being accessible for humans to read, and the readability of gene symbols is the strongest. For converting gene id, we have provided the bitr() function in clusterProfiler package.

TriLab-bioinf commented 2 years ago

Hi huerqiang, enrichGO and groupGO functions will give an error if parameter "readable" is set to TRUE given that they use DOSE::setReadable() for converting gene IDs to gene symbols. Some OrgDb databases like Saccharomyces cerevisiae do not use the SYMBOL column to store gene symbols but other columns (e.g. "GENENAME"). The fix I made allows calling other columns when the gene symbols are not in the expected SYMBOL column. Since I saw that some other people are having issues using ClusterProfiler with the Saccharomyces cerevisiae database because of this issue, I thought to share the fix I made with the rest of the community. Thanks, Hernan

huerqiang commented 2 years ago

Hi TriLab-bioinf , Thanks, it is quite a issue. I suggest that when a gene symbol exists, choose it by default. Whether a column similar to a gene symbol(e.g. "GENENAME") can be automatically selected when the gene symbol does not exist so that the user does not have to choose the keyType. Of course, it's a good idea to provide a parameter(keyoutput) for users to choose from.