bio4j / bio4j

Bio4j abstract model and general entry point to the project
http://www.bio4j.com
GNU Affero General Public License v3.0
118 stars 19 forks source link

Review database cross-references #120

Open eparejatobes opened 8 years ago

eparejatobes commented 8 years ago

Right now there are 155 database cross-references in UniProt:

I want to discuss here

eparejatobes commented 8 years ago

The idea is fairly simple:

  1. If a cross-reference is deemed useful enough to be imported, that implies that the database itself to which it points is a useful resource. Then, it should be modeled and included at some point
  2. Normally these cross-references point to something that would be a vertex, so we just create a graph for that DB and add vertices for the linked entities, having only their IDs
rtobes commented 8 years ago

This is the selection of the most important cross-references:

rtobes commented 8 years ago

Ranked by importance:

The most important:

    GO

Gene Ontology · UniProtKB (42,738,622) Category: Ontologies

   InterPro

Integrated resource of protein families, domains and functional sites · UniProtKB (51,058,113) Category: Family and domain databases

   ENZYME

Enzyme nomenclature database · Category: Enzyme and pathway databases

Sequence databases:

   EMBL

EMBL nucleotide sequence database · UniProtKB (64,324,983) Category: Sequence databases

   GenBank

GenBank nucleotide sequence database · Category: Sequence databases

   RefSeq

NCBI Reference Sequences · UniProtKB (33,494,473) Category: Sequence databases

Gene and genome annotation databases:

Ensembl

Ensembl eukaryotic genome annotation project · UniProtKB (1,231,040) Category: Genome annotation databases

EnsemblBacteria

Ensembl bacterial and archaeal genome annotation project · UniProtKB (29,644,899) Category: Genome annotation databases

EnsemblFungi

Ensembl fungal genome annotation project · UniProtKB (5,116,290) Category: Genome annotation databases

EnsemblMetazoa

Ensembl metazoan genome annotation project · UniProtKB (1,043,430) Category: Genome annotation databases

EnsemblPlants

Ensembl plant genome annotation project · UniProtKB (1,446,406) Category: Genome annotation databases

EnsemblProtists

Ensembl protists genome annotation project · UniProtKB (1,538,599) Category: Genome annotation databases

   PATRIC

Pathosystems Resource Integration Center (PATRIC) · UniProtKB (5,907,286) Category: Genome annotation databases

  BioCyc

BioCyc Collection of Pathway/Genome Databases · UniProtKB (4,685,844) Category: Enzyme and pathway databases

Networks, interactions, pathways, metabolism:

STRING

STRING: functional protein association networks · UniProtKB (7,617,500) Category: Protein-protein interaction databases

UniPathway

UniPathway: a resource for the exploration and annotation of metabolic pathways · UniProtKB (3,221,066) Category: Enzyme and pathway databases

IntAct

Protein interaction database and analysis system · UniProtKB (60,278) Category: Protein-protein interaction databases

   DIP

Database of interacting proteins · UniProtKB (20,104) Category: Protein-protein interaction databases

 KEGG

KEGG: Kyoto Encyclopedia of Genes and Genomes · UniProtKB (12,259,742) Category: Genome annotation databases

rtobes commented 8 years ago

I know that Biosystems is not a cross-reference of Uniprot but I think that in the future we could add this database to Bio4j:

Theoretically it is open and it has a FTP to download its data:

I have worked with this database and it is useful, it is updated, cover all the organisms and cover metabolic and signalling pathways.

The connection with Uniprot could be: