GMOD / jbrowse-components

Source code for JBrowse 2, a modern React-based genome browser
https://jbrowse.org/jb2
Apache License 2.0
202 stars 61 forks source link

protein data demo web service #158

Closed rbuels closed 5 years ago

rbuels commented 5 years ago

Little express.js-based web service that we will use to demo the protein viewer, that just takes a single human protein name as input (e.g. KRAS), picks the canonical isoform (if there are isoforms), and returns:

cmdcolin commented 5 years ago

I'm trying to munge some of the ensembl variation biomart because it has all the cosmic info, and then it just needs to join up to the actual cosmic vcf which has some sample info e.g. frequency of being mutated

Example of some ensembl biomart variation data

Variant name    Variant source  Chromosome/scaffold name    Chromosome/scaffold position start (bp) Chromosome/scaffold position end (bp)   Clinical significance   Variant start in cDNA (bp)  Variant end in cDNA (bp)    Variant start in translation (aa)   Variant end in translation (aa) Variant start in CDS (bp)   Variant end in CDS (bp) Variant consequence Gene stable ID  Transcript stable ID    PolyPhen prediction PolyPhen score  SIFT prediction SIFT score  Variant Set Name    P value Associated gene with phenotype  Synonym name    Synonym source  Title   Authors Year    PubMed ID   Variant Set Description Associated variant risk allele  Associated variant names    Source name Study external reference    Study description   Variant alleles 1000 Genomes global minor allele frequency (all individuals)    1000 Genomes global minor allele count (all individuals)    Minor allele (ALL)  Protein allele  Consequence specific allele Biotype Transcript strand   Distance to transcript  Strand  Phenotype description
COSM5956651 COSMIC  22  15528165    15528165        7   7   3   3   7   7   coding_sequence_variant ENSG00000130538 ENST00000252835                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 C/COSMIC_MUTATION   protein_coding  1   6   1   Haematopoietic and lymphoid tissue tumour
COSM5956651 COSMIC  22  15528165    15528165        7   7   3   3   7   7   coding_sequence_variant ENSG00000130538 ENST00000252835                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 C/COSMIC_MUTATION   protein_coding  1   6   1   Haematopoietic and lymphoid tissue tumour
COSM3842126 COSMIC  22  15528194    15528194        3   3   1   1   3   3   start_retained_variant  ENSG00000130538 ENST00000643195                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 G/COSMIC_MUTATION   protein_coding  1   2   1   Breast tumour
COSM3842126 COSMIC  22  15528194    15528194        36  36  12  12  36  36  coding_sequence_variant ENSG00000130538 ENST00000252835                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 G/COSMIC_MUTATION   protein_coding  1   35  1   Breast tumour
COSM3842126 COSMIC  22  15528194    15528194        3   3   1   1   3   3   start_retained_variant  ENSG00000130538 ENST00000643195                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 G/COSMIC_MUTATION   protein_coding  1   2   1   Breast tumour
COSM3842126 COSMIC  22  15528194    15528194        36  36  12  12  36  36  coding_sequence_variant ENSG00000130538 ENST00000252835                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 G/COSMIC_MUTATION   protein_coding  1   35  1   Breast tumour
COSM5834074 COSMIC  22  15528194    15528195        3   4   1   2   3   4   start_retained_variant  ENSG00000130538 ENST00000643195                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 GA/COSMIC_MUTATION  protein_coding  1   2   1   Breast tumour
COSM5834074 COSMIC  22  15528194    15528195        36  37  12  13  36  37  coding_sequence_variant ENSG00000130538 ENST00000252835                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 GA/COSMIC_MUTATION  protein_coding  1   35  1   Breast tumour
COSM5834074 COSMIC  22  15528194    15528195        3   4   1   2   3   4   start_retained_variant  ENSG00000130538 ENST00000643195                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 GA/COSMIC_MUTATION  protein_coding  1   2   1   Breast tumour
COSM5834074 COSMIC  22  15528194    15528195        36  37  12  13  36  37  coding_sequence_variant ENSG00000130538 ENST00000252835                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 GA/COSMIC_MUTATION  protein_coding  1   35  1   Breast tumour
COSM6883968 COSMIC  22  15528213    15528213        22  22  8   8   22  22  coding_sequence_variant ENSG00000130538 ENST00000643195                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 T/COSMIC_MUTATION   protein_coding  1   21  1   Skin tumour
COSM6883968 COSMIC  22  15528213    15528213        55  55  19  19  55  55  coding_sequence_variant ENSG00000130538 ENST00000252835                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 T/COSMIC_MUTATION   protein_coding  1   54  1   Skin tumour
COSM6883968 COSMIC  22  15528213    15528213        22  22  8   8   22  22  coding_sequence_variant ENSG00000130538 ENST00000643195                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 T/COSMIC_MUTATION   protein_coding  1   21  1   Skin tumour
COSM6883968 COSMIC  22  15528213    15528213        55  55  19  19  55  55  coding_sequence_variant ENSG00000130538 ENST00000252835                 COSMIC phenotype variants                                   Phenotype annotations of somatic mutations found in human cancers from the COSMIC project           COSMIC          COSMIC_MUTATION                 T/COSMIC_MUTATION   protein_coding  1   54  1   Skin tumour
COSM181480  COSMIC  22  15528254    15528254        63  63  21  21  63  63  coding_sequence_variant ENSG00000130538 ENST00000643195                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 C/COSMIC_MUTATION   protein_coding  1   62  1   Endometrium tumour
COSM181480  COSMIC  22  15528254    15528254        96  96  32  32  96  96  coding_sequence_variant ENSG00000130538 ENST00000252835                 All phenotype/disease-associated variants                                   Variants that have been associated with a phenotype or a disease            COSMIC          COSMIC_MUTATION                 C/COSMIC_MUTATION   protein_coding  1   95  1   Endometrium tumour
cmdcolin commented 5 years ago

I have here variants and domains downloaded for a given protein https://github.com/cmdcolin/protein_data_service

I was thinking that I can make it also return sequence

If there is any other feedback on how data should be formatted let me know

cmdcolin commented 5 years ago

One improvement could also include "collating multiple rows of the output" into a nested json structure

cmdcolin commented 5 years ago

I think this is working, if there is a bug maybe we can open it in https://github.com/cmdcolin/protein_data_service