eweitz / ideogram

Chromosome visualization for the web
https://eweitz.github.io/ideogram
Other
288 stars 72 forks source link

Instant gene description #299

Closed eweitz closed 2 years ago

eweitz commented 2 years ago

This adds genes' full names to the gene cache, enabling basic tooltips without an Internet connection.

Previously, basic gene data -- symbol, genomic coordinates, Ensembl ID -- was cached in a TSV file. A gene's full name (sometimes called its description) is not necessary to display the gene itself or begin the cascade of events to fetch paralogs and interacting genes in the related genes kit. So, given its non-blocking nature, the full name was fetched from the fast MyGene.info API.

However, this meant that, despite Ideogram's recent shift to the web Cache API, there was still a substantial dependency on network connectivity. Now, that dependency is reduced. Cache generation code in Python now uses GFF3 files instead of GTF files, as only the former contains full names (which they call "descriptions"). Full names are fetched from the gene cache, which is loaded only once per Ideogram version and persists across page loads. This improves resiliency and simplifies the client-side code.

Although the cache has more data, it's smaller at rest than before because it's compressed via gzip. This keeps client storage demands small. It also simplifies maintenance by letting the cache be stored and served from the same free CDN directory as the code.

Old, network requests needed to fetch full gene name:

Uncached_gene_full_name__Ideogram_2022-04-05

New, network requests not needed to fetch full gene name:

Cached_gene_full_name__Ideogram_2022-04-05
coveralls commented 2 years ago

Coverage Status

Coverage decreased (-0.08%) to 86.966% when pulling 52dcbbb0d85c8bc28b687a33388103700e33a6bf on instant-gene-description into b3ff53ebd5694c79ff97624d2a32b021249985a8 on master.