Open advweb-grp1 opened 1 year ago
Questions for all. Data format? JSON or XML? Examples of endpoints to call? Setup? Do we need to register for a key?
Ayman found basically the only API we can use to get the gene description. Its called Proteins API and can return data in JSON, XML, Text. It does have a limit of 200 requests/second/user but we probably won't be exceeding this.
This is an example URL using a gene in our dataset "TNN": https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&gene=Tnn
We are basically only interested in the "comments" object, extracting the "text" and "value".
My API does not return the data in other formats. Other than that, we can display some information about the clinical synopsis.
Protein API: protein api is the api that returns descriptions of gene mutations. An example url includes https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&gene=MYH7 Useful endpoints: Full name of mutation:
<protein>
<submittedName>
<fullName evidence="12">Myosin heavy chain 7B</fullName>
</submittedName>
</protein>
To get the full name of each gene mutation the fullName endpoint is useful.
Example gene mutation descriptions: MYH7:
<comment type="subcellular location">
<subcellularLocation>
<location evidence="1">Membrane</location>
<topology evidence="1">Single-pass type I membrane protein</topology>
</subcellularLocation>
</comment>
TNN:
<comment type="function">
<text evidence="1">Troponin T is the tropomyosin-binding subunit of troponin, the thin filament regulatory complex which confers calcium-sensitivity to striated muscle actomyosin ATPase activity.</text>
</comment>
<reference evidence="4 6" key="1">
<citation type="journal article" date="2009" name="PLoS Biol." volume="7" first="E1000112" last="E1000112">
<title>Lineage-specific biology revealed by a finished genome assembly of the mouse.</title>
By researching protein api the 2 main endpoints that are useful to use are the"title" endpoint and "comment"endpoint since these 2 endpoints contain descriptions of the gene.
Return data can be either JSON, XML or text and can be specified using the keyword so for json it would be application/json, application/xml or text/x-fasta the java provided to specify which type of data is return would be: httpConnection.setRequestProperty("Accept", "application/json");
Below is some sample code for implementing the api in java:
import java.net.URL;
import java.net.URLConnection;
import java.net.HttpURLConnection;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.Reader;
public class APIRequest {
public static void main(String[] args) throws Exception {
String requestURL = "https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=100&protein=Tnn";
URL url = new URL(requestURL);
URLConnection connection = url.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection)connection;
httpConnection.setRequestProperty("Accept", "application/json");
InputStream response = connection.getInputStream();
int responseCode = httpConnection.getResponseCode();
if(responseCode != 200) {
throw new RuntimeException("Response code was not 200. Detected response was "+responseCode);
}
String output;
Reader reader = null;
try {
reader = new BufferedReader(new InputStreamReader(response, "UTF-8"));
StringBuilder builder = new StringBuilder();
char[] buffer = new char[8192];
int read;
while ((read = reader.read(buffer, 0, buffer.length)) > 0) {
builder.append(buffer, 0, read);
}
output = builder.toString();
}
finally {
if (reader != null) try {
reader.close();
} catch (IOException logOrIgnore) {
logOrIgnore.printStackTrace();
}
}
System.out.println(output);
}
}
No keys are required to use protein api
OMIM:
OMIM seems to be a repository of gene mutations for physicians and by advanced students in science. Searching for gene mutations such as MYH7 returns data such as
https://www.omim.org/entry/617472?search=tnn&highlight=tnn
However to use the api it requires an api key to have allow us to have access to it.
API Key
The API key is a key that is unique to every developer wanting to access the API. It is allocated by OMIM and should not be shared. This has to be included with every request and is validated before the request is processed. There are three ways in which it can be included with a request.
Added as an HTTP Header as follows:
ApiKey: nfNEOscLNWWXdSmUoMLPPA
Added as a cookie as follows:
Cookie: ApiKey=nfNEOscLNWWXdSmUoMLPPA
Added as a parameter to the url request as follows:
https://api.omim.org/....?...&apiKey=nfNEOscLNWWXdSmUoMLPPA
Note that the API key parameter name is case-sensitive.
only available data without the api key is: https://www.omim.org/static/omim/data/mim2gene.txt and it only states the gene name and NCBI number which does not have any use to us
I cannot search for end points since it only allows access to users with an API key:
I have requested api access from OMIM on the 17/04/2023 Now i just have to wait for a response
After doing further research on some of the APIs we have looked at so far, proteins api has a reviewed keyword you can put into the url which only shows confirmed mutation data (reviewed=true
), and after some testing 7 out of the 9 mutations in the spec and they all use the same endpoints fullName and text, but MYL2 does not have the same text endpoint and TNNCI does not show up at all:
The data returned is in json or xml format can be changed in the url either setting it to json or xml. Does not require an api key
Home page for proteins api:
https://www.ebi.ac.uk/proteins/api/doc/#!/proteins/search
Useful endpoints include:
MYH7:
https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=1&gene=MYH7&reviewed=true&format=json
name:
{"recommendedName":{"fullName":{"value":"Myosin-7B"}}
description:
comments":[[{"type":"FUNCTION","text":[{"value":"Involved in muscle contraction","evidences":[{"code":"ECO:0000250"}]}]}
Another example but for another gene mutation (TNN)
TNN:
https://www.ebi.ac.uk/proteins/api/proteins?offset=0&size=1&gene=TNN&reviewed=true&format=json
name:
{"recommendedName":{"fullName":{"value":"Troponin T, slow skeletal muscle"}}
Description:
"comments":[{"type":"FUNCTION","text":[{"value":"Troponin T is the tropomyosin-binding subunit of troponin, the thin filament regulatory complex which confers calcium-sensitivity to striated muscle actomyosin ATPase activity"}]}
MYL2: MYL2shows up in the proteins api but the endpoint for the description reads as: "comments":[{"type":"SUBUNIT","text":[{"value":"Myosin is a hexamer of 2 heavy chains and 4 light chains","evidences":[{"code":"ECO:0000305"}]}]} Its very similar to the descriptions of the previous muatitons but the type is equal to subunit, whereas the type in the other mutaitons are FUNCTION. If there is a way to circumnavigate the type then we will be able to use TNNCI.
TNNCI: Doesnt show up at all in the proteins api.
Looked at NCBI NLM NIH api (National library of Medicine) API, it returns data in xml, and does not require an API key.
Documentation to search a database:
https://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Searching_a_Database
When searching for gene mutation such as MYH7 I get the results below:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=MYH7
The data above shows ID's of gene mutations.
After querying the ID's, the results are shown below
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=clinvar&term=2497837
After looking at this API I do not think it will be useful for our uses since none of the endpoints which seem useful for our purposes
OMIM: API that searches diseases that categories it into two categories of genotype and phenotype as well as the clinic synopsis. MGI: Rat heart mutation data HPO: API that returns heart mutation descriptions. JSON NCBI SNP: Queries a database?