Closed RNA-Ninja closed 6 years ago
Due to the nature of the data having many to many relationships, there isn't a single file. However, we do have CSV table dumps of the database within our FTP site (ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/csv/genefamily_db_tables/). Please read the README.txt file within this directory for more information.
Nice to know that the gene group/family information is downloadable in bulk. Noting that HTTPS is now supported, so these files are at https://ftp.ebi.ac.uk/pub/databases/genenames/new/csv/genefamily_db_tables/
I came here since I didn't see anything about the gene family download at https://www.genenames.org/download/archive/. However, perhaps that is because gene families are not captured as part of archive releases. +1 to a single JSON dataset with all gene families and metadata that is released for each future archive version.
One final question is what is https://ftp.ebi.ac.uk/pub/databases/genenames/new/json/genefamilies.json
?
I ended up creating a processing pipeline in https://github.com/related-sciences/nxontology-data/pull/14 to create a single JSON file with HGNC gene group information and gene assignments. The file is available at hgnc_gene_group.json
(versioned link, but can look here for the latest.
The file can be read by any JSON parser, but also follows the node-link data network serialization syntax for Python networkx and nxontology compatibility.
Here's a subset of the node output for reference:
{
"nodes": [
{
"id": 3,
"name": "Fascin family",
"name_aliases": [
"Fascins"
],
"root_symbol": "FSCN",
"typical_gene": "FSCN1",
"desc_label": null,
"desc_comment": null,
"desc_source": null,
"desc_source_url": null,
"desc_go": null,
"pubmed_ids": [
"21618240"
],
"external_note": null,
"external_resources": null,
"genes_direct": [
{
"hgnc_id": "HGNC:11148",
"symbol": "FSCN1"
},
{
"hgnc_id": "HGNC:3960",
"symbol": "FSCN2"
},
{
"hgnc_id": "HGNC:3961",
"symbol": "FSCN3"
}
],
},
I would like to download the hgnc gene family hierarchy data describing "which gene family is part of the which bigger gene super-family". I guess you are using this data to create gene family map. How can I download it in text/xml format?