eweitz / ideogram

Chromosome visualization for the web
https://eweitz.github.io/ideogram
Other
288 stars 72 forks source link

Automate gene cache population, expand from 2 to 14 organisms #286

Closed eweitz closed 2 years ago

eweitz commented 2 years ago

This encodes the process of creating gene cache files, and increases the number of supported organisms by 7x.

Previously, gene caches were artisanally constructed by hand for human and mouse. I would download a GTF file from Ensembl, pipe some greps, run replace all in various ways, and do low-code spreadsheet operations. Now, those steps are fully automated (and thus also much more clearly documented!) in gene_cache.py.

You can run that pipeline like so:

cd scripts/python
python -m venv env --copies
source env/bin/activate
pip3 install -r requirements.txt
cd cache
python3 gene_cache.py --output-dir ../../../dist/data/cache/

(Add--reuse-gtf at the end of that last command if re-running, e.g. while developing, for faster iteration.)

Organisms with gene cache available:

Also, the cache directory has also been moved, and various small bugs have been fixed.

coveralls commented 2 years ago

Coverage Status

Coverage decreased (-0.002%) to 88.909% when pulling 43381056aafdfe6a510cd1e080e9632401bf5966 on populate-gene-cache into 259e9c87ff9495139f8e61cbd377d62ae6ce5761 on master.