Computational tab UI - Githubissues

selinad commented 8 years ago

The most current wireframe for the Computational tab has been uploaded to Asana (Computational-tab_6-15-2016.2.pptx). This ticket will provide further specification of this tab page.

Computational Tools

There are 3 sets of Computational information we need to pull in - below Protein Predictors This information will be pulled in from myvariant.info - there are 17 fields with scores (listed on wireframe)- we need the Source, the value and any call they make about it (e.g. pathogenic or deleterious)

Conservation Analysis This information will come from myvariant.info as well - there are 5 fields that should be pulled in (listed on wireframe) - I believe we need to display the same fields as for the Protein Predictors, but we can confirm.

Splicing Predictors @wrightmw and I need to figure out how to get this data - for starters, we need it to come from MaxEntScan and NNSplice.

MatEntScan: * can download splice site datasets * perl wrappers available
NNSplice - can't see best way to access this data - may need to write them.
Other Variants in Codon

For this, we need to search ClinVar for the genomic location of the variant + 2 nt on either side of the variant - @wrightmw do you know how to do this search? (I have also sent Steven a message) - the last column is for the ID of the variant from the source (e.g. ClinVar VariationID, CA ID, etc.) We also need to allow them a way to add a variant to this table and be the source for it if necessary - this added to wireframe. Will involve storing a couple of curated fields. Note: Please see paper by Steven Harrison in Asana (Using_ClinVar_Current_Protocols.pdf)

Repetitive Regions

For now, we are just going to link to the UCSC and Variation Viewer browsers using the chromosomal location of the variant and a range that encompasses 30 nt on either side of the variant. We will also link to ExAC at the chromosomal position for the variant (with the change specified).

@wrightmw please review and fill in anything that I've missed or needs editing.

wrightmw commented 8 years ago

@selinad NNSplice seems to be run 'on the fly' (pardon the pun) from a browser on their website (http://www.fruitfly.org/seq_tools/splice.html). ALAMUT and other resources (e.g MutationTaster) have integrated NNSplice into their output. I can't see an API anywhere that we can grab this data from. I'll keep looking...

selinad commented 8 years ago

Thanks @wrightmw - I came to the same conclusion in looking at it. Love your pun 😄 - I think that does appear to be the only way. However, it does seem they have an API since, as you mention, ALAMUT and other resources have it integrated - that's why I thought might be simplest for a dev to write them (whoever is assigned to this). I poked around for a bit yesterday...

wrightmw commented 8 years ago

@selinad MaxEntScan is an optional plug in that we can download via the VEP: http://rest.ensembl.org/documentation/info/vep_id_post

selinad commented 8 years ago

Nice - great sleuthing! 👍

wrightmw commented 8 years ago

@selinad I agree that a dev needs to write that email about NNSplice... I think I'd a need a rosetta stone to get the language right

wrightmw commented 8 years ago

@selinad MaxEntScan and NNSplice were the minimum they wanted but since VEP also offers GeneSplicer, I wonder if we shouldn't just download that at the same time?

selinad commented 8 years ago

Dang, couldn't find an emoji for rosetta stone 😛

It looks to be of "Medium" importance on this site: https://ncbiconfluence.ncbi.nlm.nih.gov/display/CLIN/Proposed+Variant+Curation+Data+Fields

So, I vote to include it if it's easy....

wrightmw commented 8 years ago

@kgliu0101 Thanks for showing me your work on the splicing algorithms section of the Computational Tab. I have a few comments. Many of the variant curators are used to seeing the splicing data displayed in commercial software, such as Alamut. See example below: screen shot 2016-06-30 at 10 34 02 am We obviously do not have the time or staff hours to produce such a beauteous output, however I think we could follow their format somewhat since that is what they are used to seeing.

Instead of a 5' or 3' column, I would just add these as labels.
I would indicate the range in the same way they have e.g. [0-100]
Please add GeneSplicer to the tables, since we currently intend to support output from MaxEntScan, NNSPLICE and GeneSplicer

selinad commented 8 years ago

Here is doi for paper on searching ClinVar for missense variants in same codon: doi: 10.1002/0471142905.hg0816s89

wrightmw commented 8 years ago

http://www.ncbi.nlm.nih.gov/clinvar/?term=E501*%5Bvariant+name%5D+and+BRAF where ‘E’ is the reference amino acid, '501' is the amino acid position and ‘BRAF’ is the HGNC gene symbol

wrightmw commented 8 years ago

http://www.ncbi.nlm.nih.gov/clinvar/?term=R1495*+%5Bvariant+name%5D+and+BRCA1 where ‘R’ is the reference amino acid, '1495' is the amino acid position and ‘BRCA1’ is the HGNC gene symbol

selinad commented 8 years ago

Please change tab name to "Computational / Predictive"

selinad commented 8 years ago

If possible for R7alpha1, indicate primary transcript and molecular consequence (when it exists) at top of this page (like you do for NC_s on Basic Info tab)

kilodalton commented 8 years ago

Note: Links to ClinVar variants in same codon descriptions moved to #776

jimmyzhen commented 8 years ago

This is what I have so far:

The "ClinGen Predictors" table is static since I don't know which API to use at the moment.

The "Other Predictors" table will be worked on in the morning.

The "Conservation Analysis" table is functional with real data coming the "dbnsfp" object in the myvariant.info response.

wrightmw commented 8 years ago

@jimmyzhen Looking great so far :-)

I hate to be a pedant on the Conservation score predictor names, but they are written in a specific way: phyloP7way phyloP20way phastCons7way phastCons20way GERP++ SiPhy

selinad commented 8 years ago

Very nice, @jimmyzhen !!!

Nice catches on capitalization, etc. @wrightmw 👍

@wrightmw did I see you had decided on some different titles for headers? You have "Other Meta-Predictors" in the latest mockup - I'm not certain which all are all Meta-Predictors. Were you able to check with Raj on this (or perhaps you know already).

wrightmw commented 8 years ago

@selinad 'Meta'-predictors has a specific meaning that does not apply to all the predictors in dbNSFP, which is why the 'Predictors' has been used in preference.

selinad commented 8 years ago

@wrightmw - your mockup file says meta-predictors, so wanted to double-check that you weren't intending to divide them up. Sharon had wanted the meta-predictors labelled and said to check with Raj to clarify which ones are - we can do this in later release unless you have that info now and it's easy. I do think she wants to indicate Bustamante's are meta-predictors (I think true for Revel and CFTR & MYH7 data (?), but I defer to you).

selinad commented 8 years ago

@jimmyzhen - you know how we show the Highest MAF in that section above the tables on the Population tab? Would it be easy to display the primary transcript (from Basic Info tab, when it exists) in a similar way on the Computational tab? If not now, in the future is fine.

In the pptx file Pop-Comp-evaluations_7-11-16 (Asana), I put the Primary Transcripts under the "Molecular Consequence: Missense" header, but I'm thinking it should either go at top of pages (as mentioned above) OR we can leave off for this version since it's on Basic info already. Apologies for the confusion. I'll update the above file - please note that it is slightly different from the Computational wireframe, so good to cross reference both when building and ask for any clarification - the wireframe is becoming unwieldy for editing - on the other hand, happy to edit if it's confusing for devs to have these 2 files.

jimmyzhen commented 8 years ago

@selinad, this is what I have at the moment:

Would you prefer me to try to get other tables onto the tab first and then revisit the primary transcript table when times allows? Or vice versa?

selinad commented 8 years ago

Thanks, @jimmyzhen ! Looking good. I count 11 predictors rather than 13 - are there no values for the other 2 or was there difficulty pulling in?

Definitely try to get other tables into the tab first (thanks for checking) - they can find info on Basic Info tab, so that is lower priority.

@wrightmw - should we be calling them "Functional Predictors" in the title? (since we say "Conservation Analysis" for the other....or is that not quite accurate for all? Also, I think we should link to the REVEL site, if you could provide that info - are we planning to do that if no data returned?

jimmyzhen commented 8 years ago

@selinad @wrightmw I couldn't find either cadd or fitcons in the dbnsfp object, unless we should be looking into the cadd object in the myvariant.info response directly. Even in the cadd object, it is still not clear to me which is the prediction.

wrightmw commented 8 years ago

@jimmyzhen I will be in office soon, I will explain in person

selinad commented 8 years ago

oooh. @wrightmw should be able to answer - I think CADD comes from myvariant.info (they import it separately from dbNSFP). This is an important one. I think @wrightmw best to answer, but I can take a look if needed.

selinad commented 8 years ago

We will not be trying to show this value re splicing - one less thing to worry about for the moment:

jimmyzhen commented 8 years ago

If the score/prediction data for cadd and 'fitconsshould be coming from thecadd` object provided by myvariant.info, then I can look into it.

selinad commented 8 years ago

@wrightmw can help. Here is what I see for myvariant.info: http://docs.myvariant.info/en/latest/doc/data.html

If you are only bringing in dbNSFP from myvariant.info, you would miss at least CADD.

jimmyzhen commented 8 years ago

@selinad, @wrightmw, for this test release, in the static tables (of computational tab) that we don't have any data to work with, would it be okay to display a little "in progress" badge like the following:

Because we still want to show the tables on the computational tab even though we don't have all data to work with at the moment, I thought the "badge" may be slightly better than "--" visually.

Your thoughts?

selinad commented 8 years ago

Hi @jimmyzhen I really like the badges. I'd suggest yellow, but it would be too much. This looks very nice IMHO.

We are not able to get MatEntScan data?

selinad commented 8 years ago

The doc shows MaxEntScan and GeneSplicer from VEP.

If no data is found, then we use "--" (or is it "-"), correct?

wrightmw commented 8 years ago

@selinad The latest iteration of the computational tab supplied by Jimmy is pretty much what we are going to be able to show in the test release. MaxEntScan and GeneSplicer are only available from the VEP via PERL plug-ins. We realistically don't have time to do this for the test release. Our best route for these data will be if myvariantinfo can support all the splicing algorithms we need but again this won't be for a number of weeks.

selinad commented 8 years ago

Yikes. OK - did not realize that. I think we need to link them to resources.

wrightmw commented 8 years ago

NNSPLICE: http://www.fruitfly.org/seq_tools/splice.html MaxEntScan: http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html GeneSplicer: http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml HumanSplicingFinder: http://www.umd.be/HSF3/HSF.html

selinad commented 8 years ago

MaxEntScan: http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html NNSplice: http://www.fruitfly.org/seq_tools/splice.html GeneSplicer: http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml

lol - was just doing same!

Do we want Human Splicing Finder?

If all of these are coming, they will need to be added to table. Might be worth a quick look at Human Splicing Finder to see if we will be able to access via API or otherwise (before promising) - ok to just leave it off for now...

wrightmw commented 8 years ago

The three splicing programs that are routinely used are MaxEntScan, NNSPLICE and HumanSplicingFinder. The PTEN group looks at these three and then if two out of three agree then this is regarded as significant. Additionally, ALAMUT provides these three (plus GeneSplicer). Further, I noticed when Steven put up a slide the other day it included MaxEntScan, NNSPLICE and HumanSplicingFinder. I also looked around at publications and presentations on the internet and it was these three that were most used. Therefore, I think that we need to make sure we get all the splice predictors (MaxEntScan, NNSPLICE and HumanSplicingFinder) which are routinely used by variant curators into the VCI. Unfortunately it will take some work to incorporate them.

selinad commented 8 years ago

Super sleuthing, @wrightmw . It's of "medium" importance on the Confluence site from when we worked through the list with the pilot group (before you joined), but it was on Steven's email today and the PTEN groups does use it, as you noted. I guess the more the merrier, to a point. If ALAMUT can get it, there must be some way to retrieve the data...was only worried about saying something is coming that we haven't investigated...

jimmyzhen commented 8 years ago

@wrightmw, @selinad, @kilodalton,

The following is what I have so far for the computational tab:

I have pushed all changes to the repo after merging with the latest dev.

Among the changes, I also changed Average Sample Read Depth row to single column so that the string does not wrap on the population tab.

I will spin up an instance later tonight for you to review, and will continue to deal with the remaining details.

selinad commented 8 years ago

@jimmyzhen It is looking fantastic!!! Super - I can test an instance later tonight. Nice work!

wrightmw commented 8 years ago

@jimmyzhen I agree, great work. I look forward to testing it 👍 @selinad I was just adding all four splicing predictors for completeness within this ticket. I think we should stick with the two we have in there, for now, and work on the other splicing factors behind the scenes. I think we can deliver all four eventually, but it may take some time.

selinad commented 8 years ago

Ah - got it. Sounds perfect! thx @wrightmw

jimmyzhen commented 8 years ago

@selinad, @wrightmw, my apologies for not being able to spin up an instance sooner last night :(

In any case, here is the instance for your review: https://718-jz-computation-tab-ui-bdd04fb-jzhen.demo.clinicalgenome.org

jimmyzhen commented 8 years ago

UPDATE I spun up a newer instance, in which I merged with the latest dev and added some UI tweaks: https://718-jz-computation-tab-ui-f3da09f-jzhen.demo.clinicalgenome.org

Please disregard the old instance provided previously (above).

jimmyzhen commented 8 years ago

Selina, Matt,

Would you mind giving me the example URLs for those external links under "Repetitive Region" section? Are they supposed to be the same as those seen on the "Basic Info" tab? Thanks!

UPDATE: See my comment at https://github.com/ClinGen/clincoded/issues/775#issuecomment-232312155

selinad commented 8 years ago

Hi @jimmyzhen. Gosh, thank you for all your hard wee hours work.

This would be 60bp region using the chromosomal coordinates (30bp on either side of the variant) whenever possible. If you need to use the rsID for Ensembl, that would work as they center around it. If @wrightmw agrees, we could just use the same links as in the static bar. @wrightmw and I will discuss this this AM for you and @kgliu0101

selinad commented 8 years ago

Hi @jimmyzhen - just starting to test - thank you again for making it happen.

For "REVEL meta-predictor": link should probably go to their About page (https://sites.google.com/site/revelgenomics/about); also, maybe it should be "REVEL (meta-predictor)" with only the link on REVEL
@wrightmw - should we show CFTR on non-CFTR variants? Is there a name for the predictor they use for CFTR (I'm guessing no)
should their be myvariant.info info for ClinVar VariationID 5556 (rsID: rs104893668)
I was thinking your placeholder labels for coming features would show on this instance, but may be confused. Here's what I see:
for "Other Variants in Codon" section - could you please put the "See data in ClinVar" link just below the number found OR maybe in parentheses right after would be best. e.g. "Number of variants at codon: 1 (See data in ClinVar)"
If the only variant in ClinVar is the current variant, I think we should bring back 0 if easy (otherwise, we can do that in future version). If we can do this, then maybe we change the text to "Number of alternate variants in same codon." Again, this can wait.
perhaps for repetitive region we can add a note, something like "View region +/- 30bp of variant" and then put links under that....

These are my initial comments - will poke around on more variants now. Thanks again - super work!

wrightmw commented 8 years ago

@jimmyzhen re. names changes for predictors. Sharon told Selina that she'd like to see all the meta-predictors labeled as such, so we need to add 'meta-predictor' to a few of the predictor names. Please change: REVEL meta-predictor to REVEL (meta-predictor) MetaLR to MetaLR (meta-predictor) MetaSVM to MetaSVM (meta-predictor) CADD to CADD (meta-predictor)

wrightmw commented 8 years ago

@selinad No, the plan is only to show CFTR for variants on the CFTR gene

Added ticket #796 to add logic to address this

selinad commented 8 years ago

OK thx @wrightmw I was testing CV 5556, which is a non-CFTR variant - it's showing there, so @jimmyzhen that logic will need to be added.

Sorry, what I meant on original bullet was "should we be showing...." - I agree that we shouldn't.

selinad commented 8 years ago

@wrightmw thanks for adding labels for meta-predictors.

ClinGen / clincoded

Computational tab UI #718

Computational Tools

Other Variants in Codon

Repetitive Regions