IGS / gEAR

The gEAR Portal was created as a data archive and viewer for gene expression data including microarrays, bulk RNA-Seq, single-cell RNA-Seq and more.
https://umgear.org
GNU Affero General Public License v3.0
13 stars 4 forks source link

Process additional ortholog mappings outside of the Alliance of Genomes #516

Open jorvis opened 1 year ago

jorvis commented 1 year ago

We currently only have/use ortholog mappings provided by the Alliance of Genomes, which limits their work to a selection of model organisms. @carlocolantuoni has mappings for additional species via R/biomaRt/ensembl we can use to extend to other organisms in gEAR.

Carlo, could you please post files/details here?

carlocolantuoni commented 1 year ago

the function i use for getting orthologous genes across species is the getMatch() function in the SJD package in R.

in R this installs the SJD package:

library(devtools) install_github("CHuanSite/SJD")

use ?getMatch to get more details in R on this function, but briefly, you provide an input species, a vector of gene id s, an argument indicating which kind of ids these are (only symbol or ensembl supported), and then a new species for which you want orthologues. it returns a table where the 1st column is the exact "genes" argument that yo sent to it, with other columns being more gene info (including symbol and ENGSids for both the source and newly requested species).

carlocolantuoni commented 1 year ago

I will follow up on this and provide @jorvis with pairwise species orthologues mapped with the SJD function above or the "orthogene" package:

add ability to project across additional species not included in the "alliance" genomes that joshua has already implemented - 3 possible solutions to be used alone or in combination: 1] my own getMatch() function in the SJD package in R (biomaRt-based), 2] "orthogene" package in R (and other platforms?), 3] gene symbols are identical across primates (as far as i know) - can we just se Hs gene symbols for any primate?

Joshua needs:

Other notes: for some NHPs (Micali Macaque data; and cynomolgous/macaque fascicularis, but not marmoset) i have used Hs symbols to dodge this problem - might need to fix this when we have the correct species orthologues in. or is it possible to use human gene symbols across primates?

@jorvis - is this the list of genomes that were in the "alliance" annotation?: Homo sapiens, Rattus norvegicus, Mus musculus, Danio rerio, C elegans