Closed pfpjs closed 5 years ago
can you upload detailed_gene_table_v95 and the summary table somewhere I can download them?
the tests don't pass for me with these updates. I am looking into it, but any insight would be great.
I see, it has, e.g. PKCα
which sqlalchemy does not like. I'll get this working.
Whoops, Unicode characters in the HGNC aliases or synonyms are messing it up.
Using the following command:
iconv -f UTF-8 -t ASCII//TRANSLIT summary_gene_table_v95
seems to sanitize the file, but then there could be messed up gene synonyms and aliases.
The best would be to first convert the file HGNC_download
and regenerate everything. I'm currently testing that, will let you know how it goes.
I got it fixed inside of gemini. Don't worry about it! Thanks again!
I will have to revert this. The coordinates for for hg38.
Major whoops! I think I've fixed it (changed www.ensembl.org to grch37.ensembl.org basically) and opened another PR. Sorry again!
We are going to release 0.30.0 as the next version and then I'll get this in after that. Thanks so much for figuring it out and updating the PR. I just want to get this out and then do smaller updates from here.
This fixes the generation of detailed_gene and summary_gene tables from Ensembl BioMart (v95 as of this update). Addresses #902 and #912. Some notes:
make-gene_tables.sh
automates the process of generating the gene tables and cleaning up temp files. Currently outputsdetailed_gene_table_v95
andsummary_gene_table_v95
.biomart-perl
locally, gets the latest Ensembl's BioMart registry automatically, and takes quite a while to generate its cache.transcript_status
has been removed and replaced bytranscript_tsl
: see more info here.