alliance-genome / agr_archive_initial_prototype

Source code for the Alliance of Genome Resources web portal
http://prod.alliancegenome.org/
MIT License
6 stars 11 forks source link

Figure out what Orthology data to load #35

Closed zfinannee closed 8 years ago

zfinannee commented 8 years ago

The Orthology Working Group recommended that data be pulled directly from PANTHER. The Portal Use Case Working Group (PUC WG) has some questions about what data exactly to pull, and so we would like to meet with them.

This needs to be decided before Travis and Pedro can implement this part of the Use Case.

Using Label "question" for now, although this is question for the curators rather than the devs.

chris-grove commented 8 years ago

@zfinannee I spoke to WormBase members of the orthology working group this morning and was basically told that we should just push ahead with PANTHER subfamilies, at least as a start for the portal. I will contact PANTHER to see if we can get a file with all PANTHER subfamilies and their gene members.

chris-grove commented 8 years ago

I actually found this FTP directory which I think can give us everything we need, once we have unambiguous mappings to UniProt protein IDs for each gene: ftp://ftp.pantherdb.org//sequence_classifications/current_release/PANTHER_Sequence_Classification_files/

chris-grove commented 8 years ago

I've created a folder in the Portal Data Google drive folder for Panther family data for each AGR species:

https://drive.google.com/drive/u/1/folders/0BxPsNle2cGvPZndJSGtCYzZLbGs

khowe commented 8 years ago

Hi all. There is a disconnect here. The Orthology working group expected that the portal would incorporate the pairwise orthologs from Panther. These are available here:

ftp://ftp.pantherdb.org/ortholog/current_release/RefGenomeOrthologs.tar.gz

Note the discussion in other forums about the necessity to exclude xenologs.

However, it seems that the use-case working group have decided/proposed that only the groups be loaded into the initial portal. If you go ahead and do this, these should not be called "Orthology groups". They should be called "Homology groups" or even "Protein families". Two genes in the same (Panther) family/group are not necessarily orthologs.

chris-grove commented 8 years ago

@khowe et al. The portal use case working group was specifically interested in searching for, retrieving and displaying "orthology groups", which it sounds like should instead be called "homology groups" or maybe even more specifically "Panther sub-families" so as to avoid confusion or misnomers. The only place I can imagine displaying direct, pariwise orthology calls is on a gene page, which we haven't written the official use case for yet.

gabinkley commented 8 years ago

Pedro is indexing the Panther data in https://drive.google.com/drive/u/0/folders/0BxPsNle2cGvPZndJSGtCYzZLbGs and is calling it 'Homology Groups' instead of 'Orthology Groups'

If this is NOT OK, then speak up. Once again, this is to show functionality, even if the data is not 100% correct.

chris-grove commented 8 years ago

@gabinkley @pedrohr Yep, sounds good! Thanks!

chris-grove commented 8 years ago

@gabinkley @pedrohr We decided on yesterday's Use Case working group call that we would abandon the PANTHER families/subfamilies as groups of genes to display. Instead we will work on developing our next use case which will be a gene-centric page in AGR that will focus on a single gene and display data for it and all of its orthologs in other species and paralogs in the same species. As @khowe mentioned above, these pairwise orthology calls can be taken from PANTHER here:

ftp://ftp.pantherdb.org/ortholog/current_release/RefGenomeOrthologs.tar.gz

So, bottom line is that the developers shouldn't spend more effort trying to import PANTHER subfamilies for the first use case and prototype. We will outline what info we think should be displayed on an AGR gene-centric page in our next use case.

gabinkley commented 8 years ago

Closing. This decision will be implemented.