Closed selinad closed 8 years ago
@selinad NNSplice seems to be run 'on the fly' (pardon the pun) from a browser on their website (http://www.fruitfly.org/seq_tools/splice.html). ALAMUT and other resources (e.g MutationTaster) have integrated NNSplice into their output. I can't see an API anywhere that we can grab this data from. I'll keep looking...
Thanks @wrightmw - I came to the same conclusion in looking at it. Love your pun π - I think that does appear to be the only way. However, it does seem they have an API since, as you mention, ALAMUT and other resources have it integrated - that's why I thought might be simplest for a dev to write them (whoever is assigned to this). I poked around for a bit yesterday...
@selinad MaxEntScan is an optional plug in that we can download via the VEP: http://rest.ensembl.org/documentation/info/vep_id_post
Nice - great sleuthing! π
@selinad I agree that a dev needs to write that email about NNSplice... I think I'd a need a rosetta stone to get the language right
@selinad MaxEntScan and NNSplice were the minimum they wanted but since VEP also offers GeneSplicer, I wonder if we shouldn't just download that at the same time?
Dang, couldn't find an emoji for rosetta stone π
It looks to be of "Medium" importance on this site: https://ncbiconfluence.ncbi.nlm.nih.gov/display/CLIN/Proposed+Variant+Curation+Data+Fields
So, I vote to include it if it's easy....
@kgliu0101 Thanks for showing me your work on the splicing algorithms section of the Computational Tab. I have a few comments. Many of the variant curators are used to seeing the splicing data displayed in commercial software, such as Alamut. See example below: We obviously do not have the time or staff hours to produce such a beauteous output, however I think we could follow their format somewhat since that is what they are used to seeing.
Here is doi for paper on searching ClinVar for missense variants in same codon: doi: 10.1002/0471142905.hg0816s89
http://www.ncbi.nlm.nih.gov/clinvar/?term=E501*%5Bvariant+name%5D+and+BRAF where βEβ is the reference amino acid, '501' is the amino acid position and βBRAFβ is the HGNC gene symbol
http://www.ncbi.nlm.nih.gov/clinvar/?term=R1495*+%5Bvariant+name%5D+and+BRCA1 where βRβ is the reference amino acid, '1495' is the amino acid position and βBRCA1β is the HGNC gene symbol
Please change tab name to "Computational / Predictive"
If possible for R7alpha1, indicate primary transcript and molecular consequence (when it exists) at top of this page (like you do for NC_s on Basic Info tab)
Note: Links to ClinVar variants in same codon descriptions moved to #776
This is what I have so far:
The "ClinGen Predictors" table is static since I don't know which API to use at the moment.
The "Other Predictors" table will be worked on in the morning.
The "Conservation Analysis" table is functional with real data coming the "dbnsfp" object in the myvariant.info response.
@jimmyzhen Looking great so far :-)
Very nice, @jimmyzhen !!!
Nice catches on capitalization, etc. @wrightmw π
@wrightmw did I see you had decided on some different titles for headers? You have "Other Meta-Predictors" in the latest mockup - I'm not certain which all are all Meta-Predictors. Were you able to check with Raj on this (or perhaps you know already).
@selinad 'Meta'-predictors has a specific meaning that does not apply to all the predictors in dbNSFP, which is why the 'Predictors' has been used in preference.
@wrightmw - your mockup file says meta-predictors, so wanted to double-check that you weren't intending to divide them up. Sharon had wanted the meta-predictors labelled and said to check with Raj to clarify which ones are - we can do this in later release unless you have that info now and it's easy. I do think she wants to indicate Bustamante's are meta-predictors (I think true for Revel and CFTR & MYH7 data (?), but I defer to you).
@jimmyzhen - you know how we show the Highest MAF in that section above the tables on the Population tab? Would it be easy to display the primary transcript (from Basic Info tab, when it exists) in a similar way on the Computational tab? If not now, in the future is fine.
In the pptx file Pop-Comp-evaluations_7-11-16 (Asana), I put the Primary Transcripts under the "Molecular Consequence: Missense" header, but I'm thinking it should either go at top of pages (as mentioned above) OR we can leave off for this version since it's on Basic info already. Apologies for the confusion. I'll update the above file - please note that it is slightly different from the Computational wireframe, so good to cross reference both when building and ask for any clarification - the wireframe is becoming unwieldy for editing - on the other hand, happy to edit if it's confusing for devs to have these 2 files.
@selinad, this is what I have at the moment:
Would you prefer me to try to get other tables onto the tab first and then revisit the primary transcript table when times allows? Or vice versa?
Thanks, @jimmyzhen ! Looking good. I count 11 predictors rather than 13 - are there no values for the other 2 or was there difficulty pulling in?
Definitely try to get other tables into the tab first (thanks for checking) - they can find info on Basic Info tab, so that is lower priority.
@wrightmw - should we be calling them "Functional Predictors" in the title? (since we say "Conservation Analysis" for the other....or is that not quite accurate for all? Also, I think we should link to the REVEL site, if you could provide that info - are we planning to do that if no data returned?
@selinad @wrightmw I couldn't find either cadd
or fitcons
in the dbnsfp
object, unless we should be looking into the cadd
object in the myvariant.info response directly. Even in the cadd
object, it is still not clear to me which is the prediction.
@jimmyzhen I will be in office soon, I will explain in person
oooh. @wrightmw should be able to answer - I think CADD comes from myvariant.info (they import it separately from dbNSFP). This is an important one. I think @wrightmw best to answer, but I can take a look if needed.
We will not be trying to show this value re splicing - one less thing to worry about for the moment:
If the score/prediction data for cadd
and 'fitconsshould be coming from the
cadd` object provided by myvariant.info, then I can look into it.
@wrightmw can help. Here is what I see for myvariant.info: http://docs.myvariant.info/en/latest/doc/data.html
If you are only bringing in dbNSFP from myvariant.info, you would miss at least CADD.
@selinad, @wrightmw, for this test release, in the static tables (of computational tab) that we don't have any data to work with, would it be okay to display a little "in progress" badge like the following:
Because we still want to show the tables on the computational tab even though we don't have all data to work with at the moment, I thought the "badge" may be slightly better than "--" visually.
Your thoughts?
Hi @jimmyzhen I really like the badges. I'd suggest yellow, but it would be too much. This looks very nice IMHO.
We are not able to get MatEntScan data?
The doc shows MaxEntScan and GeneSplicer from VEP.
If no data is found, then we use "--" (or is it "-"), correct?
@selinad The latest iteration of the computational tab supplied by Jimmy is pretty much what we are going to be able to show in the test release. MaxEntScan and GeneSplicer are only available from the VEP via PERL plug-ins. We realistically don't have time to do this for the test release. Our best route for these data will be if myvariantinfo can support all the splicing algorithms we need but again this won't be for a number of weeks.
Yikes. OK - did not realize that. I think we need to link them to resources.
NNSPLICE: http://www.fruitfly.org/seq_tools/splice.html MaxEntScan: http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html GeneSplicer: http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml HumanSplicingFinder: http://www.umd.be/HSF3/HSF.html
MaxEntScan: http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html NNSplice: http://www.fruitfly.org/seq_tools/splice.html GeneSplicer: http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml
lol - was just doing same!
Do we want Human Splicing Finder?
If all of these are coming, they will need to be added to table. Might be worth a quick look at Human Splicing Finder to see if we will be able to access via API or otherwise (before promising) - ok to just leave it off for now...
The three splicing programs that are routinely used are MaxEntScan, NNSPLICE and HumanSplicingFinder. The PTEN group looks at these three and then if two out of three agree then this is regarded as significant. Additionally, ALAMUT provides these three (plus GeneSplicer). Further, I noticed when Steven put up a slide the other day it included MaxEntScan, NNSPLICE and HumanSplicingFinder. I also looked around at publications and presentations on the internet and it was these three that were most used. Therefore, I think that we need to make sure we get all the splice predictors (MaxEntScan, NNSPLICE and HumanSplicingFinder) which are routinely used by variant curators into the VCI. Unfortunately it will take some work to incorporate them.
Super sleuthing, @wrightmw . It's of "medium" importance on the Confluence site from when we worked through the list with the pilot group (before you joined), but it was on Steven's email today and the PTEN groups does use it, as you noted. I guess the more the merrier, to a point. If ALAMUT can get it, there must be some way to retrieve the data...was only worried about saying something is coming that we haven't investigated...
@wrightmw, @selinad, @kilodalton,
The following is what I have so far for the computational tab:
I have pushed all changes to the repo after merging with the latest dev
.
Among the changes, I also changed Average Sample Read Depth row to single column so that the string does not wrap on the population tab.
I will spin up an instance later tonight for you to review, and will continue to deal with the remaining details.
@jimmyzhen It is looking fantastic!!! Super - I can test an instance later tonight. Nice work!
@jimmyzhen I agree, great work. I look forward to testing it π @selinad I was just adding all four splicing predictors for completeness within this ticket. I think we should stick with the two we have in there, for now, and work on the other splicing factors behind the scenes. I think we can deliver all four eventually, but it may take some time.
Ah - got it. Sounds perfect! thx @wrightmw
@selinad, @wrightmw, my apologies for not being able to spin up an instance sooner last night :(
In any case, here is the instance for your review: https://718-jz-computation-tab-ui-bdd04fb-jzhen.demo.clinicalgenome.org
UPDATE
I spun up a newer instance, in which I merged with the latest dev
and added some UI tweaks:
https://718-jz-computation-tab-ui-f3da09f-jzhen.demo.clinicalgenome.org
Please disregard the old instance provided previously (above).
Selina, Matt,
Would you mind giving me the example URLs for those external links under "Repetitive Region" section? Are they supposed to be the same as those seen on the "Basic Info" tab? Thanks!
UPDATE: See my comment at https://github.com/ClinGen/clincoded/issues/775#issuecomment-232312155
Hi @jimmyzhen. Gosh, thank you for all your hard wee hours work.
This would be 60bp region using the chromosomal coordinates (30bp on either side of the variant) whenever possible. If you need to use the rsID for Ensembl, that would work as they center around it. If @wrightmw agrees, we could just use the same links as in the static bar. @wrightmw and I will discuss this this AM for you and @kgliu0101
Hi @jimmyzhen - just starting to test - thank you again for making it happen.
These are my initial comments - will poke around on more variants now. Thanks again - super work!
@jimmyzhen re. names changes for predictors. Sharon told Selina that she'd like to see all the meta-predictors labeled as such, so we need to add 'meta-predictor' to a few of the predictor names. Please change: REVEL meta-predictor to REVEL (meta-predictor) MetaLR to MetaLR (meta-predictor) MetaSVM to MetaSVM (meta-predictor) CADD to CADD (meta-predictor)
@selinad No, the plan is only to show CFTR for variants on the CFTR gene
Added ticket #796 to add logic to address this
OK thx @wrightmw I was testing CV 5556, which is a non-CFTR variant - it's showing there, so @jimmyzhen that logic will need to be added.
Sorry, what I meant on original bullet was "should we be showing...." - I agree that we shouldn't.
@wrightmw thanks for adding labels for meta-predictors.
The most current wireframe for the Computational tab has been uploaded to Asana (Computational-tab_6-15-2016.2.pptx). This ticket will provide further specification of this tab page.
Computational Tools
There are 3 sets of Computational information we need to pull in - below Protein Predictors This information will be pulled in from myvariant.info - there are 17 fields with scores (listed on wireframe)- we need the Source, the value and any call they make about it (e.g. pathogenic or deleterious)
Conservation Analysis This information will come from myvariant.info as well - there are 5 fields that should be pulled in (listed on wireframe) - I believe we need to display the same fields as for the Protein Predictors, but we can confirm.
Splicing Predictors @wrightmw and I need to figure out how to get this data - for starters, we need it to come from MaxEntScan and NNSplice.
Other Variants in Codon
For this, we need to search ClinVar for the genomic location of the variant + 2 nt on either side of the variant - @wrightmw do you know how to do this search? (I have also sent Steven a message) - the last column is for the ID of the variant from the source (e.g. ClinVar VariationID, CA ID, etc.) We also need to allow them a way to add a variant to this table and be the source for it if necessary - this added to wireframe. Will involve storing a couple of curated fields. Note: Please see paper by Steven Harrison in Asana (Using_ClinVar_Current_Protocols.pdf)
Repetitive Regions
For now, we are just going to link to the UCSC and Variation Viewer browsers using the chromosomal location of the variant and a range that encompasses 30 nt on either side of the variant. We will also link to ExAC at the chromosomal position for the variant (with the change specified).
@wrightmw please review and fill in anything that I've missed or needs editing.