FINNGEN / autoreporting

MIT License
0 stars 1 forks source link

Deprecate allele fetching from dbSNP REST api #166

Closed Lipastomies closed 3 years ago

Lipastomies commented 3 years ago

The dbsnp rest api has long been a source of a multitude of problems:

I replaced it with a separate 'AlleleDB' class, that is used to add alleles for a list of 'Location's (A namedtuple with chrom: str and pos: int). This then returns a list of 'VariantData' instances, which are namedtuples with c:p:r:a (possibly many alt alleles), rsid, and whether the variant is biallelic. Those are then joined to the gwas catalog data.

There is quite a lot of overlap between different variant-like namedtuples in Scripts/data_access/db.py (Variant and VariantData), those could be maybe combined. Not sure how to represent VariantData's multiple alt alleles with Variant, though. Maybe like this:

class Variant(NamedTuple):
    chrom: str
    pos: int
    ref: str
    alt: str

class VariantData(NamedTuple):
    variant: Variant
    other_alts: List[str] #possibly empty
    biallelic: bool
    rsid: str

Also added tests for GwasApi and LocalDB, since those lacked them, and updated wdl and wdl json to these changes.

Lipastomies commented 3 years ago

Actually like this:

class VariantData(NamedTuple):
    variant: Variant
    other_alts: List[str] #possibly emptyl
    rsid: str

  def biallelic(self) -> bool:
    len(self.other.alts)==0