Open yarikoptic opened 7 months ago
Right, as we discussed at the Distribits hackathon, now that @yarikoptic has a published dataset in Harvard Dataverse that came from DataLad we can find it with this query:
https://dataverse.harvard.edu/api/search?q=fileName:%22repo.zip%22
Here's how the search result looks:
{
"status": "OK",
"data": {
"q": "fileName:\"repo.zip\"",
"total_count": 1,
"start": 0,
"spelling_alternatives": {},
"items": [
{
"name": "repo.zip",
"type": "file",
"url": "https://dataverse.harvard.edu/api/access/datafile/10069635",
"file_id": "10069635",
"description": "",
"published_at": "2024-04-08T11:44:45Z",
"file_type": "Unknown",
"file_content_type": "application/octet-stream",
"size_in_bytes": 155736,
"md5": "b83bbf83371526579887b5879c3dce1f",
"checksum": {
"type": "MD5",
"value": "b83bbf83371526579887b5879c3dce1f"
},
"dataset_name": "OpenNeuro:ds000003 Rhyme judgment (trimmed)",
"dataset_id": "10069469",
"dataset_persistent_id": "doi:10.7910/DVN/VMSH8U",
"dataset_citation": "Halchenko, Yaroslav, 2024, \"OpenNeuro:ds000003 Rhyme judgment (trimmed)\", https://doi.org/10.7910/DVN/VMSH8U, Harvard Dataverse, V1"
}
],
"count_in_response": 1
}
}
As mentioned above, the dataset-level fields to focus on are these:
"dataset_name": "OpenNeuro:ds000003 Rhyme judgment (trimmed)",
"dataset_id": "10069469",
"dataset_persistent_id": "doi:10.7910/DVN/VMSH8U",
"dataset_citation": "Halchenko, Yaroslav, 2024, \"OpenNeuro:ds000003 Rhyme judgment (trimmed)\", https://doi.org/10.7910/DVN/VMSH8U, Harvard Dataverse, V1"
https://doi.org/10.7910/DVN/VMSH8U will resolve and redirect to the dataset at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VMSH8U
@yarikoptic and I talked about different ways to identify DataLad datasets. This "search for repo.zip" approach seems promising but could probably be refined. It's a good start!
I think we are now doomed to wait (hopefully just a little) for @joeyh to (re)implement support for "git remotes in git-annex special remotes" natively in git-annex
-- that is the design project he worked on with @mih during distribits hackathon.
Sample dataset on demo node, in non-exported (key store) flavor of the special remote:
so it seems we need to search for datasets which have a file like
XDLRA-2D--2D-refs
, probably just starting withXDLRA-
and ending with-refs
.JSON file which lists all current dataverse deployments (if we are greedy to search through all of them):
For now we could just go through https://demo.dataverse.org/ and https://dataverse.harvard.edu as "groups" (like organization for github) and not care about any other.
The search API example invocation to search for that exact filename (for now):
_.datalad/dotgit/
but it seems not work).in the returned record we get
The "things" to record would be the
hostname
dataset_persistent_id
per each dataset. Hyperlink for a dataset would be constructed as
https://{hostname}/dataset.xhtml?persistentId=doi:{dataset_persistent_id}
.note: for those URLs to become clonable, first
datalad
should be configured to loaddataverse
andnext
extensions via changes to~/.gitconfig