Open owhite opened 6 years ago
Link to DATS model : https://github.com/biocaddie/WG3-MetadataSpecifications
Latests version of DATS - DATS v2.2
Test for DATS compliance using test script: https://github.com/biocaddie/WG3-MetadataSpecifications/blob/master/tests/test_dats_model.py
Two TOPMed datasets from the publicly available dbGAP metadata mapped to DATS v2.2 are provided. Please note that neither the metadata nor the model are complete yet!!! Limitations: 1) metadata are from publicly accessible dbGAP web site; 2) study variables have not yet been mapped to dimensions; 3) limited data harmonization (mapping to standard vocabulary concepts).
Our plan includes automating the ingestion process and work on more complete mapping of metadata to DATS
Apologies...GitHub won't let me upload JSON in their issue section, any workarounds? In the interim, providing links to the files: 1) dats_phs001143 2) dats_phs000954
@aegururaj, similar to #2, shouldn't this metadata be incorporated into a BDBag and assigned an identifier? @carlkesselman @ianfoster?
@rpwagner Sure, once the model is decided on and finalized. The intent here is to be able to just look at some sample metadata, as I understand. @ianfoster @carlkesselman ?
For laughs, I created a couple of minids
with the minid CLI for the JSON files listed above. It's really easy to do so, and is perfectly fine for intermediate versions of files that might become obsolete. In fact, that is one of the features of minids
, i.e., you've got a way to create a provenance chain so even if someone references an old minid
, there should be a redirect reference to a newer version (if it exists).
Here's the exact commands I used: (Note, to make this work I used the "Shareable Link" from each file and not the web page URL of the Google Drive folder):
minid --register --title "DATS formatted metadata for dbGAP study phs000954.v1.p1" dats_phs000954.json --locations https://drive.google.com/open?id=1RFqR-b8iNRa_V8CuESWA_hw5st4P3fqG
minid --register --title "DATS formatted metadata for dbGAP study phs001143.v1.p1" dats_phs001143.json --locations https://drive.google.com/open?id=13tO-mnLCixyF_EXdNnArlTvj3xTchziZ
The results are minid:b94t3q and minid:b91113, respectively.
I encourage anyone who is interested to give the minid
CLI program a try. Its simple to install if you already have Python (and Pip) installed on your system. Just follow the guide here. Make sure to perform the initial user registration step, and then I highly recommend adding your user information into ~/.minid/minid-config.cfg
so that you do not have to specify the same arguments on the command line.
And for more grins, I've been working an R minid tool library. The dev version can do minid lookups now, which for those minids that @mikedarcy just made look like this in an R session:
> devtools::load_all()
Loading minidtools
> config <- load_configuration()
> lookup("minid:b94t3q")
MINID:
identifier = ark:/57799/b94t3q
short_identifier = minid:b94t3q
creator = mdarcy
orcid = 0000-0003-2280-917X
created = Fri, 20 Apr 2018 22:22:58 GMT
checksum = fe1d7fc641ae2befae2b7c2a989019553b22e21cdda7b9d6054617921b821613
checksum_function = SHA256
status = ACTIVE
locations = https://drive.google.com/open?id=1RFqR-b8iNRa_V8CuESWA_hw5st4P3fqG
(use locations(object) for more)
titles = DATS formatted metadata for dbGAP study phs000954.v1.p1
(use titles(object) for more)
obsoleted_by =
(use obsoleted_by(object) for more)
content_key =
> lookup("minid:b91113")
MINID:
identifier = ark:/57799/b91113
short_identifier = minid:b91113
creator = mdarcy
orcid = 0000-0003-2280-917X
created = Fri, 20 Apr 2018 22:24:05 GMT
checksum = 5a3581ebe1257a85a747d6f6af647e8c38d24867085152ed7a97ed2a45e31d47
checksum_function = SHA256
status = ACTIVE
locations = https://drive.google.com/open?id=13tO-mnLCixyF_EXdNnArlTvj3xTchziZ
(use locations(object) for more)
titles = DATS formatted metadata for dbGAP study phs001143.v1.p1
(use titles(object) for more)
obsoleted_by =
(use obsoleted_by(object) for more)
content_key =
(Note that this tool is much less mature than the python CLI program)
Initiating a thread for the DATS files generated by Team Oxygen for GTEx and TOPMed data. Files to be converted to bdbags and hosted at some location. Location to be determined.