dcppc / data-stewards

Questions and answers about TOPmed, GTEx, and AGR resources.
8 stars 0 forks source link

GTEX Identifier dump #3

Open jmcmurry opened 6 years ago

jmcmurry commented 6 years ago

We would like a dump of all of each of the GTEX identifiers in this format.

jnedzel commented 6 years ago

jmcmurry, can you please give me more detail on what you want? The data commons project is mostly focused on the raw GTEx data (e.g., BAM files). So what identifier do you want? Filename? Path to the Google bucket? I really need more context to understand what it is that you want and how you are going to use it.

ctb commented 6 years ago

@jmcmurry ^^

jmcmurry commented 6 years ago

Good question; off the top of my head highest priority are things required for search and retrieval: species and anatomy and gene ID (or if not ID, at least gene symbol). Also useful are things useful for filtering or making sense of the data once you have found it (like genome assembly ID).

That's my 100% naive take but don't jump to it until we get confirmation from @cmungall and others.

jnedzel commented 6 years ago

I'm sorry, but I'm still completely lost as to what you want. Could we schedule a call to clarify?

jnedzel commented 6 years ago

I've committed id dump files for GTEx samples and subjects to this GitHub repo. I will continue working on other entities. There are some fields in your format that we don't understand, so I've left those blank:

jmcmurry commented 6 years ago

Thanks @jnedzel If you reference an ID that you did not mint in house, the outgoing URI is the URI you reference. In your particular case, it looks like you're not using any such identifiers; however, it would be great if you could map these tissue IDs to uberon (documenting the caveat that these are not pre-mapped in situ). I have made a sample change here for two of the terms now so you know what I mean.

jnedzel commented 6 years ago

We do have Uberon IDs. Once I confirm our mapping, I will update.

jnedzel commented 6 years ago

@jmcmurry I've committed a new version of the tissue ID file, with the Uberon mapping as you requested. I've also included a separate column that contains just the Uberon ID, in addition to the outgoing URI. I can remove that column if you would prefer.