genome-nexus / genome-nexus-importer

Import data into MongoDB for use by https://github.com/genome-nexus/genome-nexus/
MIT License
4 stars 16 forks source link

Make a file with unique protein sequences and associated IDs #60

Open inodb opened 2 years ago

inodb commented 2 years ago

Maybe we can create a file that has one row per unique sequence and associated IDs? Each column can e.g. be a database so you get something like

unique protein sequence ensembl_grch37_vxx_protein ensembl_grch37_vxx_transcript ensembl_grch38_vxx_protein ensembl_grch38_vxx_transcript uniprot
RRRRR ENSPxxx ENSTyyyyy ENSPxxx ENSTyyyyy Pzzzz

We can then reuse this file for uniprot, oncokb and hotspot transcript assignments. It also allows to easily add other potential protein resources