bigdatagenomics / mango

A scalable genome browser. Apache 2 licensed.
Apache License 2.0
125 stars 31 forks source link

multi-allelic split alleles in GA4GH #281

Open jpdna opened 7 years ago

jpdna commented 7 years ago

In ADAM/Mango a genotype call is defined as a list of GenotypeAllele here: https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L966

GenotypeAllele is defined as one of: https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L753

In cases where a multi-allelic variant was split (as it is when loading to ADAM) an allele within a genotype can be OTHER_ALT as described here: https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L766

In the GA4GH schema, a genotype call is defined here: https://ga4gh-schemas.readthedocs.io/en/latest/schemas/variants.proto.html#protobuf.Call and can represent multi-allelic sites.

When report we Variant calls based on ADAM/Mango data in GA4GH API format, I am unsure how to represent the OTHER_ALT.

For now I plan to use "." for OTHER_ALT - which is document to mean missing. but if anyone has comments on my interpretation or the best way, let me know.

@david4096 may be interested

fnothaft commented 7 years ago

That's the same approach we use upstream in ADAM, so +1 from me!

david4096 commented 7 years ago

Would you post an issue describing this https://github.com/ga4gh/ga4gh-schemas/issues ?

I don't see a problem with adopting the same approach.