Automate gene cache population, expand from 2 to 14 organisms

This encodes the process of creating gene cache files, and increases the number of supported organisms by 7x.

Previously, gene caches were artisanally constructed by hand for human and mouse. I would download a GTF file from Ensembl, pipe some greps, run replace all in various ways, and do low-code spreadsheet operations. Now, those steps are fully automated (and thus also much more clearly documented!) in gene_cache.py.

You can run that pipeline like so:

cd scripts/python
python -m venv env --copies
source env/bin/activate
pip3 install -r requirements.txt
cd cache
python3 gene_cache.py --output-dir ../../../dist/data/cache/

(Add--reuse-gtf at the end of that last command if re-running, e.g. while developing, for faster iteration.)

Organisms with gene cache available:

Homo sapiens (human)
Mus musculus (mouse)
Danio rerio (zebrafish)
Gallus gallus (chicken)
Rattus norvegicus (rat)
Pan troglodytes (chimpanzee)
Macaca fascicularis (crab-eating macaque / cynologus monkey)
Macaca mulatta (c)
Canis lupus familiaris (dog)
Felis catus (cat)
Equus caballus (horse)
Bos taurus (cow)
Sus scrofa (pig)
Caenorhabditis elegans (worm / nematode)

Also, the cache directory has also been moved, and various small bugs have been fixed.

eweitz / ideogram

Automate gene cache population, expand from 2 to 14 organisms #286