This encodes the process of creating gene cache files, and increases the number of supported organisms by 7x.
Previously, gene caches were artisanally constructed by hand for human and mouse. I would download a GTF file from Ensembl, pipe some greps, run replace all in various ways, and do low-code spreadsheet operations. Now, those steps are fully automated (and thus also much more clearly documented!) in gene_cache.py.
You can run that pipeline like so:
cd scripts/python
python -m venv env --copies
source env/bin/activate
pip3 install -r requirements.txt
cd cache
python3 gene_cache.py --output-dir ../../../dist/data/cache/
(Add--reuse-gtf at the end of that last command if re-running, e.g. while developing, for faster iteration.)
Coverage decreased (-0.002%) to 88.909% when pulling 43381056aafdfe6a510cd1e080e9632401bf5966 on populate-gene-cache into 259e9c87ff9495139f8e61cbd377d62ae6ce5761 on master.
This encodes the process of creating gene cache files, and increases the number of supported organisms by 7x.
Previously, gene caches were artisanally constructed by hand for human and mouse. I would download a GTF file from Ensembl, pipe some greps, run replace all in various ways, and do low-code spreadsheet operations. Now, those steps are fully automated (and thus also much more clearly documented!) in
gene_cache.py
.You can run that pipeline like so:
(Add
--reuse-gtf
at the end of that last command if re-running, e.g. while developing, for faster iteration.)Organisms with gene cache available:
Also, the cache directory has also been moved, and various small bugs have been fixed.