MaayanLab / x2k_web

The X2K Web project
https://x2x.cloud
Apache License 2.0
4 stars 1 forks source link

Duplicated genes in TFEA, KEA X2K web output #13

Open ajw2329 opened 5 years ago

ajw2329 commented 5 years ago

Hello,

First of all, thanks for providing such a fantastic tool!

I wasn't quite sure where to write this because it's not strictly a bug, but I thought it worth highlighting the fact that duplicate TF and kinase gene names often show up in X2Kweb output as below:

ChEA KEA

Running using the API makes things it clear that the reason for the TF gene duplication is just the default use of both CHEA, ENCODE entries. Nonetheless, as this may be confusing for downstream users maybe it would be better to use name (i.e. SUZ12_CHEA) instead of simpleName in the TFEA output?

In the KEA output, there are occasionally multiple entries for the same kinase referred to by different names. For instance, GSKB and GSKBETA both have separate entries (as well as GSKA, GSKALPHA, and GSK. I suspect this has a similar underlying issue in that (if I understand correctly) KEA 2018 is also aggregating from many sources - possibly the underlying source could be attached as with the TFs above? It is also worth mentioning that for the 'non-standard' names (e.g. GSK3BETA) the harmonizome link provided with the output does not match an entry (http://amp.pharm.mssm.edu/Harmonizome/gene/GSK3BETA).

Thanks very much again!

Best, Andrew

AviMaayan commented 5 years ago

Thank you for the feedback. It is very helpful.

Best,

Avi

Avi Ma’ayan, PhD
Professor, Department of Pharmacological Sciences
Director, Mount Sinai Center for Bioinformatics
Icahn School of Medicine at Mount Sinai
New York, NY 10029
(212) 241-1153
Lab: http://www.mssm.edu/labs/maayan

From: ajw2329 notifications@github.com Sent: Saturday, May 25, 2019 6:14:42 PM To: MaayanLab/x2k_web Cc: Subscribed Subject: [MaayanLab/x2k_web] Duplicated genes in TFEA, KEA X2K web output (#13)

Hello,

First of all, thanks for providing such a fantastic tool!

I wasn't quite sure where to write this because it's not strictly a bug, but I thought it worth highlighting the fact that duplicate TF and kinase gene names often show up in X2Kweb output as below:

[ChEA]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_12420913_58374965-2D98fc2d80-2D7efd-2D11e9-2D8fc8-2D27877f7a0e02.png&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=OTriZuxvm8goxicZHLrvnuKigY-wW0EZRYZ7nbS2vys&m=JeADpwoeKgKZaAP4JxJI5_Vs0AeiYKKLrDGMvUqxgWE&s=5RuoxzyEUPNihyddi350R0lWVHtXTH2kvaGEWOjfK9U&e= [KEA]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_12420913_58374964-2D98fc2d80-2D7efd-2D11e9-2D926c-2D20acdb4fdd8a.png&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=OTriZuxvm8goxicZHLrvnuKigY-wW0EZRYZ7nbS2vys&m=JeADpwoeKgKZaAP4JxJI5_Vs0AeiYKKLrDGMvUqxgWE&s=zW6WZOmF8qgcF5WTSinRdvxFKp9E_f0l8tI3BjUyuNQ&e=

Running using the API makes things it clear that the reason for the TF gene duplication is just the default use of both CHEA, ENCODE entries. Nonetheless, as this may be confusing for downstream users maybe it would be better to use name (i.e. SUZ12_CHEA) instead of simpleName in the TFEA output?

In the KEA output, there are occasionally multiple entries for the same kinase referred to by different names. For instance, GSKB and GSKBETA both have separate entries (as well as GSKA, GSKALPHA, and GSK. I suspect this has a similar underlying issue in that (if I understand correctly) KEA 2018 is also aggregating from many sources - possibly the underlying source could be attached as with the TFs above? It is also worth mentioning that for the 'non-standard' names (e.g. GSK3BETA) the harmonizome link provided with the output does not match an entry (http://amp.pharm.mssm.edu/Harmonizome/gene/GSK3BETA).

Thanks very much again!

Best, Andrew

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MaayanLab_x2k-5Fweb_issues_13-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DADBIR2NTJJ5EKVY4R4WMV33PXG25FA5CNFSM4HPVPKY2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GV3VYMQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=OTriZuxvm8goxicZHLrvnuKigY-wW0EZRYZ7nbS2vys&m=JeADpwoeKgKZaAP4JxJI5_Vs0AeiYKKLrDGMvUqxgWE&s=3eEXonl8TXuVR1Yip0OfN3XnsZxKR_Lyonn6x150HUs&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADBIR2IMI5EGA6HPAIYJQYDPXG25FANCNFSM4HPVPKYQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=OTriZuxvm8goxicZHLrvnuKigY-wW0EZRYZ7nbS2vys&m=JeADpwoeKgKZaAP4JxJI5_Vs0AeiYKKLrDGMvUqxgWE&s=6Uhy5zXUMdE-jdbs2VXITLip7kWGs23sNLfA3t1Va6Q&e=.