advweb-grp1 / advanced-web-final-year-project

Final year advanced web develop unit project
MIT License
1 stars 0 forks source link

API #15

Open advweb-grp1 opened 1 year ago

advweb-grp1 commented 1 year ago

OMIM: API that searches diseases that categories it into two categories of genotype and phenotype as well as the clinic synopsis. MGI: Rat heart mutation data HPO: API that returns heart mutation descriptions. JSON NCBI SNP: Queries a database?

advweb-grp1 commented 1 year ago

Questions for all. Data format? JSON or XML? Examples of endpoints to call? Setup? Do we need to register for a key?

LiamSingh64 commented 1 year ago

Ayman found basically the only API we can use to get the gene description. Its called Proteins API and can return data in JSON, XML, Text. It does have a limit of 200 requests/second/user but we probably won't be exceeding this.

This is an example URL using a gene in our dataset "TNN": https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&gene=Tnn

We are basically only interested in the "comments" object, extracting the "text" and "value".

Screenshot_20230320_134816_Samsung Internet.jpg

D4ni3l8 commented 1 year ago

My API does not return the data in other formats. Other than that, we can display some information about the clinical synopsis.

AymanReh commented 1 year ago

Protein API: protein api is the api that returns descriptions of gene mutations. An example url includes https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&gene=MYH7 Useful endpoints: Full name of mutation:

<protein>
<submittedName>
<fullName evidence="12">Myosin heavy chain 7B</fullName>
</submittedName>
</protein>

To get the full name of each gene mutation the fullName endpoint is useful.

Example gene mutation descriptions: MYH7:

<comment type="subcellular location">
<subcellularLocation>
<location evidence="1">Membrane</location>
<topology evidence="1">Single-pass type I membrane protein</topology>
</subcellularLocation>
</comment>

TNN:

<comment type="function">
<text evidence="1">Troponin T is the tropomyosin-binding subunit of troponin, the thin filament regulatory complex which confers calcium-sensitivity to striated muscle actomyosin ATPase activity.</text>
</comment>
<reference evidence="4 6" key="1">
<citation type="journal article" date="2009" name="PLoS Biol." volume="7" first="E1000112" last="E1000112">
<title>Lineage-specific biology revealed by a finished genome assembly of the mouse.</title>

By researching protein api the 2 main endpoints that are useful to use are the"title" endpoint and "comment"endpoint since these 2 endpoints contain descriptions of the gene.

Return data can be either JSON, XML or text and can be specified using the keyword so for json it would be application/json, application/xml or text/x-fasta the java provided to specify which type of data is return would be: httpConnection.setRequestProperty("Accept", "application/json");

Below is some sample code for implementing the api in java:

import java.net.URL;
import java.net.URLConnection;
import java.net.HttpURLConnection;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.Reader;

public class APIRequest {

  public static void main(String[] args) throws Exception {
    String requestURL = "https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&protein=Tnn";
    URL url = new URL(requestURL);

    URLConnection connection = url.openConnection();
    HttpURLConnection httpConnection = (HttpURLConnection)connection;

    httpConnection.setRequestProperty("Accept", "application/json");

    InputStream response = connection.getInputStream();
    int responseCode = httpConnection.getResponseCode();

    if(responseCode != 200) {
      throw new RuntimeException("Response code was not 200. Detected response was "+responseCode);
    }

    String output;
    Reader reader = null;
    try {
      reader = new BufferedReader(new InputStreamReader(response, "UTF-8"));
      StringBuilder builder = new StringBuilder();
      char[] buffer = new char[8192];
      int read;
      while ((read = reader.read(buffer, 0, buffer.length)) > 0) {
        builder.append(buffer, 0, read);
      }
      output = builder.toString();
    }
    finally {
        if (reader != null) try {
          reader.close();
        } catch (IOException logOrIgnore) {
          logOrIgnore.printStackTrace();
        }
    }

    System.out.println(output);
  }
}

No keys are required to use protein api

AymanReh commented 1 year ago

OMIM: OMIM seems to be a repository of gene mutations for physicians and by advanced students in science. Searching for gene mutations such as MYH7 returns data such as image https://www.omim.org/entry/617472?search=tnn&highlight=tnn

However to use the api it requires an api key to have allow us to have access to it.

API Key
The API key is a key that is unique to every developer wanting to access the API. It is allocated by OMIM and should not be shared. This has to be included with every request and is validated before the request is processed. There are three ways in which it can be included with a request.

Added as an HTTP Header as follows:

ApiKey: nfNEOscLNWWXdSmUoMLPPA

Added as a cookie as follows:

Cookie: ApiKey=nfNEOscLNWWXdSmUoMLPPA

Added as a parameter to the url request as follows:

https://api.omim.org/....?...&apiKey=nfNEOscLNWWXdSmUoMLPPA

Note that the API key parameter name is case-sensitive.

only available data without the api key is: https://www.omim.org/static/omim/data/mim2gene.txt and it only states the gene name and NCBI number which does not have any use to us

I cannot search for end points since it only allows access to users with an API key: image

AymanReh commented 1 year ago

I have requested api access from OMIM on the 17/04/2023 Now i just have to wait for a response image

AymanReh commented 1 year ago

After doing further research on some of the APIs we have looked at so far, proteins api has a reviewed keyword you can put into the url which only shows confirmed mutation data (reviewed=true ), and after some testing 7 out of the 9 mutations in the spec and they all use the same endpoints fullName and text, but MYL2 does not have the same text endpoint and TNNCI does not show up at all: The data returned is in json or xml format can be changed in the url either setting it to json or xml. Does not require an api key Home page for proteins api: https://www.ebi.ac.uk/proteins/api/doc/#!/proteins/search

Useful endpoints include: MYH7: https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=1&gene=MYH7&reviewed=true&format=json name: {"recommendedName":{"fullName":{"value":"Myosin-7B"}}

description: comments":[[{"type":"FUNCTION","text":[{"value":"Involved in muscle contraction","evidences":[{"code":"ECO:0000250"}]}]}

Another example but for another gene mutation (TNN) TNN: https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=1&gene=TNN&reviewed=true&format=json name: {"recommendedName":{"fullName":{"value":"Troponin T, slow skeletal muscle"}}

Description: "comments":[{"type":"FUNCTION","text":[{"value":"Troponin T is the tropomyosin-binding subunit of troponin, the thin filament regulatory complex which confers calcium-sensitivity to striated muscle actomyosin ATPase activity"}]}

MYL2: MYL2shows up in the proteins api but the endpoint for the description reads as: "comments":[{"type":"SUBUNIT","text":[{"value":"Myosin is a hexamer of 2 heavy chains and 4 light chains","evidences":[{"code":"ECO:0000305"}]}]} Its very similar to the descriptions of the previous muatitons but the type is equal to subunit, whereas the type in the other mutaitons are FUNCTION. If there is a way to circumnavigate the type then we will be able to use TNNCI.

TNNCI: Doesnt show up at all in the proteins api.

AymanReh commented 1 year ago

Looked at NCBI NLM NIH api (National library of Medicine) API, it returns data in xml, and does not require an API key. Documentation to search a database: https://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Searching_a_Database When searching for gene mutation such as MYH7 I get the results below: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=MYH7 image The data above shows ID's of gene mutations. After querying the ID's, the results are shown below https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=2497837 image After looking at this API I do not think it will be useful for our uses since none of the endpoints which seem useful for our purposes

advweb-grp1 commented 1 year ago

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=4606&retmode=xml