Open GaretJax opened 6 years ago
Sorry for the late reply, I canceled the notification for new postes and was on a congress last week.
That is a really good question. The safest way would obviously be to consider the maximum one, but I think one has to consider the amount of data in the different databases. I would assume, the more data the safer the predictated frequency. I would suggest to use the maximum one for now. It is the safest solution and the information from other databases is still there, so nothing is lost.
No problem. Beat told me so. ;-)
I am working on custom checks on lab/primer basis; would an option to select which databases to use (or the order of preference) be something which might solve this issue?
I like the idea. So everyone can choose there one prefered database. As long as it is not to complicated to program and to much choice for the user to overwhelm.
Are you using only 1000Genomes now as a MAF source. The example above is a new primer, the lower picture shows no MAFs but when I followed the link to gnomAD (upper picture) there is a MAF (only a very low one, but still).
No, we're still using everything that Ensembl sends us. Sadly their database is not up to date with regard to this primer.
The strange thing is that here http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=17:78185858-78186858;v=rs775329488;vdb=variation;vf=141108533 it is showing the population frequencies, but over their API that data is not returned:
[
{
"allele_string": "C/T",
"assembly_name": "GRCh37",
"colocated_variants": [
{
"allele_string": "C/T",
"end": 78186358,
"id": "rs775329488",
"seq_region_name": 17,
"start": 78186358,
"strand": 1
}
],
"end": 78186358,
"id": "rs775329488",
"input": "rs775329488",
"most_severe_consequence": "non_coding_transcript_exon_variant",
"seq_region_name": "17",
"start": 78186358,
"strand": 1,
"transcript_consequences": [
{
"biotype": "protein_coding",
"consequence_terms": [
"intron_variant"
],
"gene_id": "ENSG00000181523",
"gene_symbol": "SGSH",
"gene_symbol_source": "HGNC",
"hgnc_id": 10818,
"impact": "MODIFIER",
"strand": -1,
"transcript_id": "ENST00000326317",
"variant_allele": "T"
},
...
]
}
]
I'll have to investigate if they changed their API anyhow, but so far it looks like that for other variants that information is still available.
It looks like that the MAF is very low for that primer, that might be the reason that the data is not included in the API response...
I know the MAF is very low, it just appeared while comparing results form the tool and the way that acts as a makeshift (searching the primer in Alamut) right now. And there the SNP was displayed in gnomAD with a very low frequency. It is strange so the frequency appears on the ensembl site but not in the API response. I hope this is not because of unsupported archive sites...
The first part of this (support and display exact information for all available databases) has been implemented. The second part comes next (letting the user choose which databases he/she would like to query).
I experience still some problems with the MAFs (example above). red arrow: the MAF of this variant is depicted as n/a in genetic tools and as 0.0003578 in gnomAD, is this only a mistake in display? Only a certain numer of decimal points are displayed? black arrow: gnomAD seems to have a problem with the rs-numbers most of the links result an error.
That's what is returned by the Ensembl API (no frequencies).
I still use GRCh37 by default on all primers, I'll work on supporting both (user selectable, GRCh38 by default) soon, it should be really easy. Maybe for GRCh38 we will have the frequencies.
For gnomAD, it's a known issue, the current implementation is overly simplistic and external references will be reworked.
I just find it confusing, as gnomAD, NHLBI, and 1000Genomes are separately listed and no MAF is listed for them.
Hi Ann-Kathrin, I just release an improvement over the previous way we were linking to gnomAD. There are still issues but I think it's mostly because the variants are really not know to gnomAD.
Hi Jonathan, I see the link is now working but there are still no MAFs displayed. What do you mean by "the variants are really not know to gnomAD"? I can see the variant in the gnomAD browser (see picture above).
There are two issues here:
I'll reach out to Ensembl asking if this is intended or something they can fix on their side, but more than that we will not be able to do.
I see. But could this be a problem only of Ensembl GRCh37? Becouse in the GRCh38 version the variant is known in Ensembl. Unfortunately I can not reach the GRCh37 version right now.
It's a problem in both GRCh37 and GRCh38, but interestingly, only for the API. The web view shoes the expected results.
I wrote them, let's see what they answer.
Hi Jonathan! What did Ensembl say?
Some variants have different MAFs for the same population coming from different databases (for example https://genetic.tools/labs/test/primers/98/forward/21890486...21890505/).
What's the correct way to handle these? Just consider the maximum one?
/cc @AnCaTjin