ga4gh / ga4gh-server

Reference implementation of the APIs defined in ga4gh-schemas. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
96 stars 91 forks source link

Variant annotation - update AnalysisResult #833

Open sarahhunt opened 8 years ago

sarahhunt commented 8 years ago

AnalysisResult now holds the analysisId rather than the full Analysis record.

https://github.com/ga4gh/schemas/pull/525/files

david4096 commented 8 years ago

@sarahhunt Can you point me to expected values based on the test data? Is the analysisId in AnalysisResult the same as the analysisId given to the Analysis field of a VariantAnnotationSet? Thanks!

sarahhunt commented 8 years ago

@david4096 - no the VariantAnnotationSet and AnalysisResult use different Analysis records.

At VariantAnnotationSet level, this is an annotation package like SnpEff or VEP. This Analysis information is essential - the compliance test checkVariantAnnotationAnalysis checks for this and the test data is in the file header.

The AnalysisResults in TranscriptEffect are the results of potentially multiple different prediction tools, for example the results of two different protein impact predictors:

"analysisResults": [ { "analysisId": "ID_SIFT.5.2.2", "score": "0.43", "result": "tolerated" }, { "analysisId": "ID_Polyphen.2.2.2_r405", "score": 0.012, "result": "benign" } ]

This is optional and often not relevant so not populated - it's like a more structured form of the info fields we have in the API. I could generate some test data, but there isn't currently a standard format for storing such data in VCF.

@jeromekelleher - How are the info fields handled by the reference implementation - are they populated with data to demonstrate their use? I'm now wondering if it is appropriate this structure is populated.

jeromekelleher commented 8 years ago

@sarahhunt, we fill out the info fields with the corresponding information from the VCF. I think we should probably leave this field empty for now if there isn't a standard way of encoding the information in VCF.

sarahhunt commented 8 years ago

Thanks @jeromekelleher - that makes sense.

david4096 commented 8 years ago

I'll table #906 until we arrive at a way to get these data in.

david4096 commented 8 years ago

@sarahhunt To generate the test data would you run SIFT over the variants in a VCF and then introduce those values under a new key in the info for the variant in the VCF? Thanks for your help!

sarahhunt commented 8 years ago

@david4096 - sure I can have a look at this, but it will be in a couple of weeks.