Maybe break into the JSON.gz (ie all the keys) and the transcript details (which is in REST)
Be sure to document strangeness, ie:
In the JSON - we store the gene info using the ID (eg RefSeq 80167) as key
When we build the data from UTA (your example has "url": "postgresql://uta.biocommons.org/uta_20210129" - we can't know they ID, only the symbol. So we make up a fake symbol
https://github.com/SACGF/cdot/wiki/Transcript-JSON-format
Maybe break into the JSON.gz (ie all the keys) and the transcript details (which is in REST)
Be sure to document strangeness, ie:
In the JSON - we store the gene info using the ID (eg RefSeq 80167) as key
When we build the data from UTA (your example has
"url": "postgresql://uta.biocommons.org/uta_20210129"
- we can't know they ID, only the symbol. So we make up a fake symbol