Closed KittyMurphy closed 2 years ago
Hey @KittyMurphy,
These differences are due to the different databases that orthogene
can pull from. The default when using convert_orthologs
by itself is "gprofiler2", which pulls from the gprofiler website. The default method for all orthogene
functions within EWCE, however, is "homologene", which is not only faster but has better mappings between mouse and human (though has fewer species it can map).
When you run using method="homologene"
, you see it returns 69 genes (2 more than one2one
). This is because one2one
actually uses an old static version of the NCBI HomoloGene database, whereas orthogene uses a periodically updated version of the same database.
method="homologene"
or_genes_mouse <- orthogene::convert_orthologs(or_genes_human$V1,input_species = "HUMAN",output_species = "mouse", method="homologene")
Preparing gene_df.
character format detected.
Converting to data.frame
Extracting genes from input_gene.
119 genes extracted.
Converting HUMAN ==> mouse orthologs using: homologene
Retrieving all organisms available in homologene.
Mapping species name: HUMAN
Common name mapping found for human
1 organism identified from search: 9606
Retrieving all organisms available in homologene.
Mapping species name: mouse
Common name mapping found for mouse
1 organism identified from search: 10090
Checking for genes without orthologs in mouse.
Extracting genes from input_gene.
92 genes extracted.
Extracting genes from ortholog_gene.
92 genes extracted.
Checking for genes without 1:1 orthologs.
Dropping 10 genes that have multiple input_gene per ortholog_gene (many:1).
Dropping 3 genes that have multiple ortholog_gene per input_gene (1:many).
Filtering gene_df with gene_map
Setting ortholog_gene to rownames.
=========== REPORT SUMMARY ===========
Total genes dropped after convert_orthologs :
50 / 119 (42%)
Total genes remaining after convert_orthologs :
69 / 119 (58%)
method="gprofiler2"
or_genes_mouse <- orthogene::convert_orthologs(or_genes_human$V1,input_species = "HUMAN",output_species = "mouse", method="gprofiler")
Preparing gene_df.
character format detected.
Converting to data.frame
Extracting genes from input_gene.
119 genes extracted.
Converting HUMAN ==> mouse orthologs using: gprofiler
Retrieving all organisms available in gprofiler.
Using stored `gprofiler_orgs`.
Mapping species name: HUMAN
Common name mapping found for human
1 organism identified from search: hsapiens
Retrieving all organisms available in gprofiler.
Using stored `gprofiler_orgs`.
Mapping species name: mouse
Common name mapping found for mouse
1 organism identified from search: mmusculus
Checking for genes without orthologs in mouse.
Extracting genes from input_gene.
239 genes extracted.
Extracting genes from ortholog_gene.
239 genes extracted.
Dropping 24 NAs of all kinds from ortholog_gene.
Checking for genes without 1:1 orthologs.
Dropping 120 genes that have multiple input_gene per ortholog_gene (many:1).
Dropping 23 genes that have multiple ortholog_gene per input_gene (1:many).
Filtering gene_df with gene_map
Setting ortholog_gene to rownames.
=========== REPORT SUMMARY ===========
Total genes dropped after convert_orthologs :
72 / 119 (61%)
Total genes remaining after convert_orthologs :
47 / 119 (39%)
Great, thank you, Brian!
Interesting! If One2One says they are orthologs then I'd lean towards trusting that. Orthogene has nunerous methods, right? Are some more conservative?
From: Kitty Murphy @.> Sent: 04 May 2022 15:11 To: NathanSkene/EWCE @.> Cc: Subscribed @.***> Subject: [NathanSkene/EWCE] 1:1 ortholog mapping using orthogene and One2One yields different results (Issue #61)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
I have used one2one and orthogene to get mouse orthologs from a list of human genes (n=119). Using orthogene retains 47/119 genes, whereas one2one retains 67/119.
or_genes_mouse <- row.names(orthogene::convert_orthologs(or_genes_human,input_species = "HUMAN",output_species = "mouse"))
gprofiler_orgs
.gprofiler_orgs
.=========== REPORT SUMMARY ===========
Total genes dropped after convert_orthologs : 72 / 119 (61%) Total genes remaining after convert_orthologs : 47 / 119 (39%)
length(or_genes_human[or_genes_human %in% One2One::ortholog_data_Mouse_Human$orthologs_one2one$human.symbol])
Data
I have attached the human gene list (n=119).
utils::sessionInfo()
— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/61, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE7SYGBHT5XSJ3YDLEDVIKAQTANCNFSM5VCDKUCA. You are receiving this because you are subscribed to this thread.Message ID: @.***>
@NathanSkene please see my previous explanation of the source of differences: https://github.com/NathanSkene/EWCE/issues/61#issuecomment-1137620249
When you run using method="homologene", you see it returns 69 genes (2 more than one2one). This is because one2one actually uses an old static version of the NCBI HomoloGene database, whereas orthogene uses a periodically updated version of the same database.
1. Bug description
I have used one2one and orthogene to get mouse orthologs from a list of human genes (n=119). Using orthogene retains 47/119 genes, whereas one2one retains 67/119.
2. Reproducible example
Code
Data
I have attached the human gene list (n=119).
3. Session info
OR_genes_human.csv