JULIELab / gepi

GePI (GEne - Protein Interactions) is a web portal for quick and convenient access to gene - protein interaction mentions automatically extracted from the biomedical literature, i.e. PubMed and PubMed Central (Open Access Subset).
GNU General Public License v3.0
1 stars 0 forks source link

Id mapping (e.g. UniProt2Gene) - howto? #18

Open khituras opened 7 years ago

khituras commented 7 years ago

How exactly do we want to map UniProt IDs to NCBI Gene IDs? One possibility would be the file described at ftp://ftp.pir.georgetown.edu/databases/idmapping/idmapping.tb.readme. We have this on our server's harddisc and use it for GeNo resources creation. This would result in a kind of "static" mapping since we had to update our resources for a new mapping. Would that be an issue? Hosting the mapping ourselves would be much quicker then doing queries to an external web service, I suppose. Especially for long lists of IDs.

SchSascha commented 7 years ago

Of course providing the mapping ourselves would be quicker. Also of course the downside would be that we have to update it on a regular basis (What would be the interval? We would need an automated update script as well). Altogether I would vote for having it on our servers, if only for stability issues (would be fun, if GePi breaks just because a mapping service elsewhere is down).

Besides, we always need to have a mapping from a given ID (currently uniprot or entrez) to the aggregated ID, right? Question is, whether an intermediate step for uniprot via uniprot -> entrez -> aggregated id is necessary, or whether each aggregated ID knows which entrez AND uniprot IDs it belongs to. Probably only entrez IDs are known, right?

On 02.03.2017 17:49, Erik Fäßler wrote:

Assigned #18 https://github.com/khituras/gepi/issues/18 to @SchSascha https://github.com/SchSascha.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/khituras/gepi/issues/18#event-983993823, or mute the thread https://github.com/notifications/unsubscribe-auth/AG7eDyUMTU8P0pXqr_spldnp83OaCdTsks5rhvMTgaJpZM4MRPZ6.

-- Dr. Sascha Schäuble | JULIE Lab, FSU Jena, Germany http://www.julielab.de/Staff/Dr_+Sascha+Sch%C3%A4uble.html ☎ +49 3641 9 44324

khituras commented 7 years ago

I also thought that nodes in the database should just know their IDs. Aggregates have no Entrez ID themselves but just rely on their elements that have IDs. I think we would just add the mapped UniProt IDs into the database. This way, we can always do an update and the changes would be in effect immediately.

Interval - I don't know, we can define it ourselves. I actually don't know how often the mapping file is updated I referenced.

SchSascha commented 7 years ago

I agree with this proposal.

Interval: I don't know it as well, honestly. Pragmatically, we should not care about it at all right now, but keep it in mind that such a functionality might be advantageous in the future, e.g. when updating the medline/PMC index, update mapping resources alongside.

On 03.03.2017 15:31, Erik Fäßler wrote:

I also thought that nodes in the database should just know their IDs. Aggregates have no Entrez ID themselves but just rely on their elements that have IDs. I think we would just add the mapped UniProt IDs into the database. This way, we can always do an update and the changes would be in effect immediately.

Interval - I don't know, we can define it ourselves. I actually don't know how often the mapping file is updated I referenced.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/khituras/gepi/issues/18#issuecomment-283967582, or mute the thread https://github.com/notifications/unsubscribe-auth/AG7eD_TyzbmZj0ZoYJuqY45xYWte4Dlhks5riCQzgaJpZM4MRPZ6.

-- Dr. Sascha Schäuble | JULIE Lab, FSU Jena, Germany http://www.julielab.de/Staff/Dr_+Sascha+Sch%C3%A4uble.html ☎ +49 3641 9 44324