ga4gh / ga4gh-schemas

Models and APIs for Genomic data. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
214 stars 114 forks source link

Reintroduce Beacon #773

Open david4096 opened 7 years ago

david4096 commented 7 years ago

We have known for a while that simple client can be used against a GA4GH service to create a beacon service (https://github.com/kozbo/Beacon-on-GA4GH-API). However, there are some confusing mismatches in the way references are handled, for example.

We would like to reintroduce the beacon into the mainline API to ensure the API includes all the fields needed to work easily with the beacon. I have written up the proto that provides a starting place. In principle, this could be merged as is, however, we believe the community will benefit from removing impedance mismatches in the two offerings.

kozbo commented 7 years ago

@david4096 I think it would be helpful to those not intimately familiar with both specs to list out the possible impedance mismatches. Please give us a list.

mbaudis commented 7 years ago

This can be coordinated w/ @mcupak, @KyleGao, myself etc. Happy to work with you on this!

david4096 commented 7 years ago

Ok! Thanks both. I have not taken the time to work extensively with the platform, but I understand at least two preliminary confusions and offer one place for continued development.

[ deleted, Beacon and Genomics API use the same counting scheme see below ]

References

Having not implemented a beacon myself, I don't know exactly how folks are managing to answer the reference_set part of the search request, but to me this seems like it could be an error prone or poorly specified way of interacting with the data.

Since there are few canonical sequences, the beacon specification benefits from a loose specification of the reference_set_name, where we ought to provide a way to better specify the exact reference. This could be an optional search request method that would allow a server that hosts both a References service and a Beacon service to provide full fidelity. I believe the beacon response could be confusing between multiple builds within a series hg38.1, hg38.2? Someone with more practical experience implementing beacons will hopefully correct me.

Ultimately, we aspire to provide a natural path for genomes across the kingdoms to be interchanged, and the references API captures this aspiration by providing a species field when describing a reference. Although we haven't tried to tackle the hard problems of querying on circular genomes, we are working to better generalize the reference model here.

Types

Lastly, we ought to have a unified representation of the variant types that need to be interchanged. @mbaudis has offered a suggestion for this representation in the Genomics API here. Further work will specify the query functionality, and the representations ought to be in harmony between VCF, Genomics API, and Beacon.

mcupak commented 7 years ago

I'm all for removing mismatches in the APIs, but this might be jumping the gun a bit. Can we think about this more and involve the Beacon group in the discussion?

Beacon goes beyond the scope of the main GA4GH APIs. We have our own API specification and tooling, but we try to stay close to the main API. You can create a beacon on top of a reference server, and we try to make it easy for you (example, example), but it's important that you can use a different backend, and the vast majority of beacons do.

If you take a copy of the Beacon API and merge it here, you're just duplicating everything (and the related projects, such as the reference server, the client etc., would also duplicate what we have downstream). Aside from the development and maintenance overhead, I'm also concerned it would be confusing to users (seeing 2 specifications for the Beacon API in virtually the same format in 2 GA4GH repositories). Personally, I'd rather see the 2 projects recommending each other instead of having this kind of duplication.

As for the differences described above, we haven't gone deep into specifying stuff like assemblies, and there is space for ambiguity on our side. We do count from 0 though, like the main API.

@mfiume

kozbo commented 7 years ago

thanks @mcupak. Sorry if it seemed like I was trying to avoid involving the Beacon group in the discussion. I was hoping that this issue would open the discussion on how we can make our two GA4GH APIs seem more like one. I think it just did, albeit clumsily. I apologize if it seemed like an end-run.

This discussion may not be best handled here. I think we need to get together and discuss where our goals are in alignment and how we can harmonize. I am sure that we are a little out of touch with the goals of Beacon, and we are certainly not aware of all the history behind the decisions you have made to this point. The broader API (the other GA4GH API to you :-) should offer other advantages for Beacon.

Where do we start?