Open jhpoelen opened 2 months ago
Hey @jhpoelen -- super cool, in an effort to do a full data review, I installed preston v0.8.6 on my own machine like so from here:
sudo sh -c '(curl -L https://github.com/bio-guoda/preston/releases/download/0.8.6/preston.jar) > /usr/local/bin/preston && chmod +x /usr/local/bin/preston && preston config-manpage' && preston version
But then I encountered an initial error using preston for recalling the BatList datasets:
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20
yields this error:
java.io.IOException: problem retrieving [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:69) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 14 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 17 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 19 more java.lang.RuntimeException: java.io.IOException: problem retrieving [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:52) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: java.io.IOException: problem retrieving [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:69) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) ... 11 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 14 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 17 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 19 more
Any ideas where I went wrong?
@n8upham Thanks for your message and for taking the time to share your description.
To print (or 'cat') the most recent provenance log with content id hash://md5/26f7ce5dd404e33c6570edd4ba250d20
, you'd have to add a "remote" to let preston know that the resource may be available elsewhere.
Here's one way to do it:
preston cat --remote https://linker.bio hash://md5/.,,
And, please note that only the metadata is available, not the pdfs. If you'd like to have the full datasets (including the pdfs) and you are comfortable using ssh, I can create an account for you and you can grab the full corpus.
Alternatively, send me a self-addressed hard disk with return postage/ and I'll send it by USPS. Note that a 128GB thumbdrive should be more than enough.
OK nice! I've got this part working now -- thanks for that tip. So when I write:
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20
I get a download of a 62.1 Mb file named "26f7ce5dd404e33c6570edd4ba250d20" that is nested in folders of /data/26/f7/
So then I can just do
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20
directly and start grepping things out of it -- e.g.,
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | wc -l
yields 658 records. So far so good.
But then when I go to the next level of
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c .[] | head -n1 | jq .
I get an error with the second calling of preston, as below --
zsh: no matches found: .[] zsh: command not found: jq java.io.IOException: problem retrieving [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.ContentQueryUtil.copyMostRecentContent(ContentQueryUtil.java:22) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:64) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 15 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 18 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 20 more java.lang.RuntimeException: java.io.IOException: problem retrieving [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:52) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: java.io.IOException: problem retrieving [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.ContentQueryUtil.copyMostRecentContent(ContentQueryUtil.java:22) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:64) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) ... 11 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 15 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 18 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 20 more
Any ideas there?
@n8upham Nice! Glad to see you got the provenance log. . .
For getting the associated Zotero metadata records, you may want to include the --remote option to the preston cat
command in your workflow.
e.g.,
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat --remote https://linker.bio | jq -c .[] | head -n1 | jq .
PS. I do realize the error messaging is a bit verbose and nerdy. . . sorry about that. . . please let me know if you have any suggestion on what kind of error messages you'd like to see instead.
PS2. Another way to get all the batlit metadata is to clone the bat-literature repo -
git clone https://github.com/bat-literature/bat-literature.github.io
Ah okay nice -- yeah this is working now. That makes sense that the 2nd calling of preston cat
also requires the remote
tag in order to be downloading those 658 individual records.
Got those downloaded now, which created a bunch of additional folders in my /data/ directory -- 224 in total, which is fewer than the 658 that I was expecting, but then I realized that several folder have multiple subfolders, so all good.
Then I realized that I don't have jq
installed -- so did that brew install jq
successfully.
But then this code
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c .[] | head -n1 | jq .
yields
zsh: no matches found: .[]
rather than the expected JSON entries
Any ideas there? Maybe not critical since it looks like I indeed have all the metadata in line -- as the command
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | head -n10
returns the first few entries
What type of further data review are you looking for?
I just ran
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c .[] | head -n1 | jq .
and this produced:
{
"key": "NYT92CCF",
"version": 48777,
"library": {
"type": "group",
"id": 5435545,
"name": "Bat Literature Project",
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/bat_literature_project",
"type": "text/html"
}
}
},
"links": {
"self": {
"href": "https://api.zotero.org/groups/5435545/items/NYT92CCF",
"type": "application/json"
},
"alternate": {
"href": "https://www.zotero.org/groups/bat_literature_project/items/NYT92CCF",
"type": "text/html"
},
"attachment": {
"href": "https://api.zotero.org/groups/5435545/items/3MVRRMR8",
"type": "application/json",
"attachmentType": "application/pdf",
"attachmentSize": 388576
}
},
"meta": {
"createdByUser": {
"id": 13229919,
"username": "acsherman",
"name": "",
"links": {
"alternate": {
"href": "https://www.zotero.org/acsherman",
"type": "text/html"
}
}
},
"creatorSummary": "Thong et al.",
"parsedDate": "2010-10-14",
"numChildren": 1
},
"data": {
"key": "NYT92CCF",
"version": 48777,
"itemType": "journalArticle",
"title": "Further records of Murina tiensa from Vietnam with first information on its echolocation calls.",
"creators": [
{
"creatorType": "author",
"firstName": "Vu Dinh",
"lastName": "Thong"
},
{
"creatorType": "author",
"firstName": "Christian",
"lastName": "Dietz"
},
{
"creatorType": "author",
"firstName": "Annette",
"lastName": "Denzinger"
},
{
"creatorType": "author",
"firstName": "Paul J. J.",
"lastName": "Bates"
},
{
"creatorType": "author",
"firstName": "Neil M.",
"lastName": "Furey"
},
{
"creatorType": "author",
"firstName": "Gabor",
"lastName": "Csorba"
},
{
"creatorType": "author",
"firstName": "Glenn",
"lastName": "Hoye"
},
{
"creatorType": "author",
"firstName": "Le Dinh",
"lastName": "Thuy"
},
{
"creatorType": "author",
"firstName": "Hans-Ulrich",
"lastName": "Schnitzler"
}
],
"abstractNote": "The fairy tube-nosed bat, Murina tiensa, is considered to be endemic to Vietnam. It is known only from the original description, when it was found at two localities in limestone karst areas. In 2008, we conducted a series of intensive field surveys throughout the country and obtained additional records of this species from various habitats, including degraded to nearly pristine forests and an offshore island. Our results indicate that M. tiensa is a sexually dimorphic species, females being considerably larger than males in all external and craniodental measurements. The species emits broadband, downward frequency-modulated echolocation calls with a dominant first harmonic. When handheld or when flying in a flight tent, signals had a similar structure and were emitted in groups of 2β4 signals. On average, signals swept from 150 to 49 kHz in 2.2 ms for handheld bats, and, from 145 to 50 kHz in 1.9 ms for flying bats. M. tiensa often occurred in sympatry with M. cyclotis and several rhinolophids.",
"publicationTitle": "Hystrix, the Italian Journal of Mammalogy",
"volume": "22",
"issue": "1",
"pages": "",
"date": "October 14, 2010",
"series": "",
"seriesTitle": "",
"seriesText": "",
"journalAbbreviation": "",
"language": "en",
"DOI": "10.4404/hystrix-22.1-4533",
"ISSN": "18255272, 03941914",
"shortTitle": "",
"url": "https://doi.org/10.4404/hystrix-22.1-4533",
"accessDate": "2024-06-28T00:31:16Z",
"archive": "",
"archiveLocation": "",
"libraryCatalog": "DOI.org (CSL JSON)",
"callNumber": "",
"rights": "",
"extra": "",
"tags": [],
"collections": [
"UAWY6DNP"
],
"relations": {
"dc:replaces": "http://zotero.org/groups/5435545/items/NF6R8YCX"
},
"dateAdded": "2024-07-08T02:34:45Z",
"dateModified": "2024-08-16T13:50:15Z"
}
}
So, unfortunately, I was unable to reproduce your result.
This hints to a workflow that uses tools that are slightly different.
Can you please confirm that
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | head -1
produces:
<https://api.zotero.org/groups/5435545/items?start=0&limit=100> <http://purl.org/pav/hasVersion> <hash://md5/00e40ec6aae2408f289ef11b3d803994> <urn:uuid:14344e3c-b535-4f32-bce4-bb0ccff10bb4> .
What type of further data review are you looking for?
Thanks for your thorough check on the availability of the batlit metadata.
Another thing that may be valuable to the batlit corpus is to have a peek at the test records as seen through their derived Zenodo deposits at https://sandbox.zenodo.org/communities/batlit-review-md5-26f7ce5dd404e33c6570edd4ba250d20 . If that is too much, please do let me know. I realize that your time is precious.
Hey @jhpoelen -- yes when I call preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | head -1
it produces
<https://api.zotero.org/groups/5435545/items?start=0&limit=100> <http://purl.org/pav/hasVersion> <hash://md5/00e40ec6aae2408f289ef11b3d803994> <urn:uuid:14344e3c-b535-4f32-bce4-bb0ccff10bb4> .
which seems to be the same.
But then when I call preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c .[] | head -n1 | jq .
I still get the error of
zsh: no matches found: .[]
So it seems that the jq -c .[]
part is not finding what it is looking for -- right?
jq -c '.[]'
and then this returned the desired output, so:
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c '.[]' | head -n1 | jq
``` {
"key": "NYT92CCF",
"version": 48777,
"library": {
"type": "group",
"id": 5435545,
"name": "Bat Literature Project",
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/bat_literature_project",
"type": "text/html"
}
}
},
"links": {
"self": {
"href": "https://api.zotero.org/groups/5435545/items/NYT92CCF",
"type": "application/json"
},
"alternate": {
"href": "https://www.zotero.org/groups/bat_literature_project/items/NYT92CCF",
"type": "text/html"
},
"attachment": {
"href": "https://api.zotero.org/groups/5435545/items/3MVRRMR8",
"type": "application/json",
"attachmentType": "application/pdf",
"attachmentSize": 388576
}
},
"meta": {
"createdByUser": {
"id": 13229919,
"username": "acsherman",
"name": "",
"links": {
"alternate": {
"href": "https://www.zotero.org/acsherman",
"type": "text/html"
}
}
},
"creatorSummary": "Thong et al.",
"parsedDate": "2010-10-14",
"numChildren": 1
},
"data": {
"key": "NYT92CCF",
"version": 48777,
"itemType": "journalArticle",
"title": "Further records of Murina tiensa from Vietnam with first information on its echolocation calls.",
"creators": [
{
"creatorType": "author",
"firstName": "Vu Dinh",
"lastName": "Thong"
},
{
"creatorType": "author",
"firstName": "Christian",
"lastName": "Dietz"
},
{
"creatorType": "author",
"firstName": "Annette",
"lastName": "Denzinger"
},
{
"creatorType": "author",
"firstName": "Paul J. J.",
"lastName": "Bates"
},
{
"creatorType": "author",
"firstName": "Neil M.",
"lastName": "Furey"
},
{
"creatorType": "author",
"firstName": "Gabor",
"lastName": "Csorba"
},
{
"creatorType": "author",
"firstName": "Glenn",
"lastName": "Hoye"
},
{
"creatorType": "author",
"firstName": "Le Dinh",
"lastName": "Thuy"
},
{
"creatorType": "author",
"firstName": "Hans-Ulrich",
"lastName": "Schnitzler"
}
],
"abstractNote": "The fairy tube-nosed bat, Murina tiensa, is considered to be endemic to Vietnam. It is known only from the original description, when it was found at two localities in limestone karst areas. In 2008, we conducted a series of intensive field surveys throughout the country and obtained additional records of this species from various habitats, including degraded to nearly pristine forests and an offshore island. Our results indicate that M. tiensa is a sexually dimorphic species, females being considerably larger than males in all external and craniodental measurements. The species emits broadband, downward frequency-modulated echolocation calls with a dominant first harmonic. When handheld or when flying in a flight tent, signals had a similar structure and were emitted in groups of 2β4 signals. On average, signals swept from 150 to 49 kHz in 2.2 ms for handheld bats, and, from 145 to 50 kHz in 1.9 ms for flying bats. M. tiensa often occurred in sympatry with M. cyclotis and several rhinolophids.",
"publicationTitle": "Hystrix, the Italian Journal of Mammalogy",
"volume": "22",
"issue": "1",
"pages": "",
"date": "October 14, 2010",
"series": "",
"seriesTitle": "",
"seriesText": "",
"journalAbbreviation": "",
"language": "en",
"DOI": "10.4404/hystrix-22.1-4533",
"ISSN": "18255272, 03941914",
"shortTitle": "",
"url": "https://doi.org/10.4404/hystrix-22.1-4533",
"accessDate": "2024-06-28T00:31:16Z",
"archive": "",
"archiveLocation": "",
"libraryCatalog": "DOI.org (CSL JSON)",
"callNumber": "",
"rights": "",
"extra": "",
"tags": [],
"collections": [
"UAWY6DNP"
],
"relations": {
"dc:replaces": "http://zotero.org/groups/5435545/items/NF6R8YCX"
},
"dateAdded": "2024-07-08T02:34:45Z",
"dateModified": "2024-08-16T13:50:15Z"
}
}
@n8upham Yay! Thanks for trying this out. I'll make sure to add the quotes in the method section of https://batlit.org . Thanks for being creative in dealing with this.
For sure, yeah it was the help file for man jq
that indicated that
Unix shells: jq Β΄.["foo"]Β΄
So I thought to try the quoting -- annoying that different shells treat this differently.
I've updated the methods section, can you please confirm that the command now works through "copy-paste" ?
What type of further data review are you looking for?
Thanks for your thorough check on the availability of the batlit metadata.
Another thing that may be valuable to the batlit corpus is to have a peek at the test records as seen through their derived Zenodo deposits at https://sandbox.zenodo.org/communities/batlit-review-md5-26f7ce5dd404e33c6570edd4ba250d20 . If that is too much, please do let me know. I realize that your time is precious.
The test records look good -- I just noticed that there is some unevenness in how the taxonomy metadata ("Biodiversity section") is annotated so far, but that is likely something that we continue to amend / build, e.g. https://sandbox.zenodo.org/records/102854 vs. https://sandbox.zenodo.org/records/101967 and https://sandbox.zenodo.org/records/101965
I've updated the methods section, can you please confirm that the command now works through "copy-paste" ?
Yes the command does work via copy/paste now -- but it depends on me having already run the command using the --remote https://linker.bio
flag on both of the preston cat
calls, and that I'm in the appropriate directory to be able to find those downloaded files. So I'd suggest further documentation of that process
I've updated the examples in https://batlit.org to include your suggestions. Please let me know if there's anything else that needs updating to reproduce the examples.
Awesome, I'd say it's good to go now.
Only thing -- for total newbs, it won't be obvious how to install preston
or jq
-- but I can understand that you also want to keep the help documentation for those utilities separate to ease future maintenance. But new students getting on board are likely to hit an initial wall there
@n8upham thanks for your feedback.
I've added the following "box" to the https://batlit.org description -
π‘ In the following sections, some examples are listed that uses a notation commonly used in the Unix shell, also known as the βcommandlineβ or βterminalβ. And, at the time of writing, these examples can be executed/run provided the following programs are available: preston, jq as well as more commonly available unix/posix/linux programs like grep, sort, and uniq. To run these programs, please use some Linux distribution, MacOS, or Windows Subsystem for Linux (WSL) available on Windows 10 and higher. These tools are powerful tools that are able to process lots of data very quickly and have the ability to run offline. If you have unfamiliar with these tools, you may benefit from them by going through a Carpentries Lesson like https://librarycarpentry.org/lc-shell/ or many of the other educational materials. Note that some of these tools have been around since the 1970s and are likely to stick around a little while longer.
Please feel free to edit or suggest changes via https://github.com/bat-literature/bat-literature.github.io/blob/main/README.md .
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat --remote https://linker.bio | jq -c .[] | head -n1 | jq .
zsh: no matches found: .[]
[https://linker.bio/hash:...404e33c6570edd4ba250d20] 8 MB at 5.45 MB/sB/s [https://linker.bio/hash:...404e33c6570edd4ba250d20] 9 MB at 5.65 MB/s [https://linker.bio/hash:...404e33c6570edd4ba250d20] 11 MB at 5.81 MB/s [https://linker.bio/hash:...404e33c6570edd4ba250d20] 31 MB at 6.46 MB/s ...
preston cat hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c '.[]' | head -n1 | jq
java.io.IOException: problem retrieving [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:69) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 14 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 17 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 19 more java.lang.RuntimeException: java.io.IOException: problem retrieving [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:52) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: java.io.IOException: problem retrieving [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:69) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) ... 11 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 14 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 17 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/26f7ce5dd404e33c6570edd4ba250d20] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 19 more
Hi @ariadnamorales thanks for trying this out, and apologies for the cryptic error messages. They are quite informative for me, but may not be as helpful for others.
Hmm. You are able to download the content, but for some reason unable to access them.
Do you have permissions to create files in the folder from which you run preston?
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | head -1
[https://linker.bio/hash:...404e33c6570edd4ba250d20] 59 MB at 6.75 MB/s completed in < 1 minute
https://api.zotero.org/groups/5435545/items?start=0&limit=100 http://purl.org/pav/hasVersion hash://md5/00e40ec6aae2408f289ef11b3d803994
[https://linker.bio/hash:...404e33c6570edd4ba250d20] 59 MB at 6.75 MB/s completed in < 1 minute
https://api.zotero.org/groups/5435545/items?start=0&limit=100 http://purl.org/pav/hasVersion hash://md5/00e40ec6aae2408f289ef11b3d803994
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat | jq -c .[] | head -n1 | jq
zsh: no matches found: .[] java.io.IOException: problem retrieving [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.ContentQueryUtil.copyMostRecentContent(ContentQueryUtil.java:22) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:64) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 15 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 18 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 20 more java.lang.RuntimeException: java.io.IOException: problem retrieving [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:52) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:44) at picocli.CommandLine.executeUserObject(CommandLine.java:1939) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at bio.guoda.preston.Preston.run(Preston.java:103) at bio.guoda.preston.Preston.main(Preston.java:94) Caused by: java.io.IOException: problem retrieving [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:59) at bio.guoda.preston.cmd.ContentQueryUtil.copyContent(ContentQueryUtil.java:33) at bio.guoda.preston.cmd.ContentQueryUtil.copyMostRecentContent(ContentQueryUtil.java:22) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:64) at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:49) ... 11 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:94) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:46) at bio.guoda.preston.store.AliasDereferencer.get(AliasDereferencer.java:18) at bio.guoda.preston.cmd.ContentQueryUtil.getContent(ContentQueryUtil.java:51) ... 15 more Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:25) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:10) at bio.guoda.preston.store.AliasDereferencer.dereferenceAliasedHash(AliasDereferencer.java:92) ... 18 more Caused by: java.io.IOException: cannot find content identified by [hash://md5/00e40ec6aae2408f289ef11b3d803994] at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:69) at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23) ... 20 more
yes, I have permits, is my private laptop (Mac sonoma 14.5)
@ariadnamorales thanks for sharing. Your example shows that you have permission to save content locally.
Did you try and copy-paste the associated example from https://batlit.org? It appears that your second preston cat
does not include the --remote https://linker.bio
option.
You may have to refresh the webpage to get the most recent one. Alternatively, you can visit the README.md of the https://github.com/bat-literature/bat-literature.github.io and get the code from there.
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | grep "items[?]" | grep hasVersion | preston cat --remote https://linker.bio | jq -c .[] | head -n1 | jq .
zsh: no matches found: .[] [https://linker.bio/hash:...ae2408f289ef11b3d803994] 552 kB at 1.61 MB/s completed in < 1 minute [https://linker.bio/hash:...411e22d75d3eb10fff3e318] 451 kB at 1.38 MB/s completed in < 1 minute [https://linker.bio/hash:...a5576d75db8b8bb7b06fda4] 451 kB at 1.27 MB/s completed in < 1 minute [https://linker.bio/hash:...8d880ac029afad7597660f9] 453 kB at 1.28 MB/s completed in < 1 minute [https://linker.bio/hash:...b6933a856b8d42a25657106] 477 kB at 1.38 MB/s completed in < 1 minute [https://linker.bio/hash:...9474d1ef43b7e2329d06929] 437 kB at 1.55 MB/s completed in < 1 minute [https://linker.bio/hash:...1a9af4ce7d03b6dd6432062] 474 kB at 1.24 MB/s completed in < 1 minute [https://linker.bio/hash:...d42bed83e3748eb393d13e7] 463 kB at 1.28 MB/s completed in < 1 minute [https://linker.bio/hash:...a4796f02ccaed7243d03bd2] 484 kB at 1.73 MB/s completed in < 1 minute [https://linker.bio/hash:...f08fb8e31483a03dd86fbfa] 472 kB at 1.55 MB/s completed in < 1 minute [https://linker.bio/hash:...9bcb54b509b4dffc66b5282] 474 kB at 1.20 MB/s completed in < 1 minute [https://linker.bio/hash:...dc998a01d76a7bd067c054b] 466 kB at 1.50 MB/s completed in < 1 minute [https://linker.bio/hash:...7ce2c03bb9e0dc375b378e1] 461 kB at 1.41 MB/s completed in < 1 minute [https://linker.bio/hash:...09ed8ebd6e54b37ce3174e5] 459 kB at 1.50 MB/s completed in < 1 minute ^C amorales@Ariadnas-MacBook-Pro-2 batLit_rev % ls data tmp amorales@Ariadnas-MacBook-Pro-2 batLit_rev % tree data data βββ 00 βΒ Β βββ e4 βΒ Β βββ 00e40ec6aae2408f289ef11b3d803994 βββ 1e βΒ Β βββ ea βΒ Β βββ 1eeae4a50b6933a856b8d42a25657106 βββ 26 βΒ Β βββ f7 βΒ Β βββ 26f7ce5dd404e33c6570edd4ba250d20 βββ 28 βΒ Β βββ 92 βΒ Β βββ 28928b5499474d1ef43b7e2329d06929 βββ 44 βΒ Β βββ 92 βΒ Β βββ 44920a3afd42bed83e3748eb393d13e7 βββ 4f βΒ Β βββ 18 βΒ Β βββ 4f18e85729bcb54b509b4dffc66b5282 βββ 62 βΒ Β βββ 5f βΒ Β βββ 625f55ea0a5576d75db8b8bb7b06fda4 βββ 65 βΒ Β βββ 11 βΒ Β βββ 651198dd2dc998a01d76a7bd067c054b βββ a5 βΒ Β βββ 14 βΒ Β βββ a514225d9f08fb8e31483a03dd86fbfa βββ c6 βΒ Β βββ 5d βΒ Β βββ c65d7b49f411e22d75d3eb10fff3e318 βββ dd βΒ Β βββ c1 βΒ Β βββ ddc14113109ed8ebd6e54b37ce3174e5 βββ e0 βΒ Β βββ 82 βΒ Β βββ e082fef387ce2c03bb9e0dc375b378e1 βββ ee βΒ Β βββ 39 βΒ Β βββ ee39ddd1e8d880ac029afad7597660f9 βββ f4 βΒ Β βββ 8d βΒ Β βββ f48dfa1921a9af4ce7d03b6dd6432062 βββ f9 βββ f2 βββ f9f279316a4796f02ccaed7243d03bd2
31 directories, 15 files
Looks like you might want to quote (single quotes) the jq command . . . jq '.[]'
as in:
preston cat --remote https://linker.bio/ hash://md5/26f7ce5dd404e33c6570edd4ba250d20\
| grep "items[?]"\
| grep hasVersion\
| preston cat --remote https://linker.bio/\
| jq -c '.[]'\
| head -n1\
| jq .
Apologies for these tweaks . . . curious to hear whether that works better for you now.
preston cat --remote https://linker.bio hash://md5/26f7ce5dd404e33c6570edd4ba250d20 | \
grep "items[?]" | \
grep hasVersion | \
preston cat --remote https://linker.bio | \
jq -c '.[]' | \
head -n1 | \
jq .
And seem to be running without a problem:
{
"key": "NYT92CCF",
"version": 48777,
"library": {
"type": "group",
"id": 5435545,
"name": "Bat Literature Project",
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/bat_literature_project",
"type": "text/html"
}
}
},
"links": {
"self": {
"href": "https://api.zotero.org/groups/5435545/items/NYT92CCF",
"type": "application/json"
},
"alternate": {
"href": "https://www.zotero.org/groups/bat_literature_project/items/NYT92CCF",
"type": "text/html"
},
"attachment": {
"href": "https://api.zotero.org/groups/5435545/items/3MVRRMR8",
"type": "application/json",
"attachmentType": "application/pdf",
"attachmentSize": 388576
}
},
"meta": {
"createdByUser": {
"id": 13229919,
"username": "acsherman",
"name": "",
"links": {
"alternate": {
"href": "https://www.zotero.org/acsherman",
"type": "text/html"
}
}
},
"creatorSummary": "Thong et al.",
"parsedDate": "2010-10-14",
"numChildren": 1
},
"data": {
"key": "NYT92CCF",
"version": 48777,
"itemType": "journalArticle",
"title": "Further records of Murina tiensa from Vietnam with first information on its echolocation calls.",
"creators": [
{
"creatorType": "author",
"firstName": "Vu Dinh",
"lastName": "Thong"
},
{
"creatorType": "author",
"firstName": "Christian",
"lastName": "Dietz"
},
{
"creatorType": "author",
"firstName": "Annette",
"lastName": "Denzinger"
},
{
"creatorType": "author",
"firstName": "Paul J. J.",
"lastName": "Bates"
},
{
"creatorType": "author",
"firstName": "Neil M.",
"lastName": "Furey"
},
{
"creatorType": "author",
"firstName": "Gabor",
"lastName": "Csorba"
},
{
"creatorType": "author",
"firstName": "Glenn",
"lastName": "Hoye"
},
{
"creatorType": "author",
"firstName": "Le Dinh",
"lastName": "Thuy"
},
{
"creatorType": "author",
"firstName": "Hans-Ulrich",
"lastName": "Schnitzler"
}
],
"abstractNote": "The fairy tube-nosed bat, Murina tiensa, is considered to be endemic to Vietnam. It is known only from the original description, when it was found at two localities in limestone karst areas. In 2008, we conducted a series of intensive field surveys throughout the country and obtained additional records of this species from various habitats, including degraded to nearly pristine forests and an offshore island. Our results indicate that M. tiensa is a sexually dimorphic species, females being considerably larger than males in all external and craniodental measurements. The species emits broadband, downward frequency-modulated echolocation calls with a dominant first harmonic. When handheld or when flying in a flight tent, signals had a similar structure and were emitted in groups of 2β4 signals. On average, signals swept from 150 to 49 kHz in 2.2 ms for handheld bats, and, from 145 to 50 kHz in 1.9 ms for flying bats. M. tiensa often occurred in sympatry with M. cyclotis and several rhinolophids.",
"publicationTitle": "Hystrix, the Italian Journal of Mammalogy",
"volume": "22",
"issue": "1",
"pages": "",
"date": "October 14, 2010",
"series": "",
"seriesTitle": "",
"seriesText": "",
"journalAbbreviation": "",
"language": "en",
"DOI": "10.4404/hystrix-22.1-4533",
"ISSN": "18255272, 03941914",
"shortTitle": "",
"url": "https://doi.org/10.4404/hystrix-22.1-4533",
"accessDate": "2024-06-28T00:31:16Z",
"archive": "",
"archiveLocation": "",
"libraryCatalog": "DOI.org (CSL JSON)",
"callNumber": "",
"rights": "",
"extra": "",
"tags": [],
"collections": [
"UAWY6DNP"
],
"relations": {
"dc:replaces": "http://zotero.org/groups/5435545/items/NF6R8YCX"
},
"dateAdded": "2024-07-08T02:34:45Z",
"dateModified": "2024-08-16T13:50:15Z"
}
}
[https://linker.bio/hash:...75dc340b2c951ef0fb231e0] 423 kB at 1.60 MB/s completed in < 1 minute
[https://linker.bio/hash:...18489b9946479b5bcb86f63] 465 kB at 1.35 MB/s completed in < 1 minute
[https://linker.bio/hash:...75d8d648bae36cdf037f886] 465 kB at 1.21 MB/s completed in < 1 minute
[https://linker.bio/hash:...66badeb3b2544aedb1f97e4] 417 kB at 1.49 MB/s completed in < 1 minute
[https://linker.bio/hash:...e33ffb6e243468680fd15f0] 425 kB at 1.79 MB/s completed in < 1 minute
[https://linker.bio/hash:...a06d86af6137a40f9c4ed4a] 469 kB at 1.63 MB/s completed in < 1 minute
[https://linker.bio/hash:...b13ca488fb57c2ee6c64f2e] 344 kB at 0.93 MB/s completed in < 1 minute
[https://linker.bio/hash:...911fc99665ff10ce5927a63] 309 kB at 1.15 MB/s completed in < 1 minute
[https://linker.bio/hash:...1799efa016045d440569491] 280 kB at 1.01 MB/s completed in < 1 minute
[https://linker.bio/hash:...a85298cde80837e72c1b641] 264 kB at 0.94 MB/s completed in < 1 minute
[https://linker.bio/hash:...0be3eb5d9c6a6e65c892f55] 355 kB at 1.10 MB/s completed in < 1 minute
[https://linker.bio/hash:...21f145ea2db051715b0dba3] 301 kB at 1.08 MB/s completed in < 1 minute
[https://linker.bio/hash:...f58c786d36940e6ccd1266a] 337 kB at 1.18 MB/s completed in < 1 minute
[https://linker.bio/hash:...eb5a3afc46f7533400f8639] 387 kB at 1.09 MB/s completed in < 1 minute
[https://linker.bio/hash:...5a3085b64b172d89bad83e0] 247 kB at 1.20 MB/s completed in < 1 minute
[https://linker.bio/hash:...e5a06e372d07a7c102158e3] 341 kB at 1.06 MB/s completed in < 1 minute
[https://linker.bio/hash:...f71c391e934f08c7cdd8fef] 305 kB at 1.12 MB/s completed in < 1 minute
[https://linker.bio/hash:...55ebd215f386eae7ab04082] 297 kB at 1.08 MB/s completed in < 1 minute
[https://linker.bio/hash:...c7383824a47b2446d1b5293] 310 kB at 1.12 MB/s completed in < 1 minute
[https://linker.bio/hash:...c3df654b915ff7f47c9a7e1] 309 kB at 1.14 MB/s completed in < 1 minute
[https://linker.bio/hash:...d715cbabb402592361616e4] 308 kB at 1.12 MB/s completed in < 1 minute
[https://linker.bio/hash:...c35845e21bb654820c7a415] 328 kB at 1.18 MB/s completed in < 1 minute
[https://linker.bio/hash:...07635e3eb644e89608a98b9] 333 kB at 1.52 MB/s completed in < 1 minute
[https://linker.bio/hash:...320d5dec0c9e987bfc57517] 294 kB at 1.12 MB/s completed in < 1 minute
[https://linker.bio/hash:...9679c42989e0d1c0aa49a5b] 283 kB at 1.03 MB/s completed in < 1 minute
[https://linker.bio/hash:...14a5787b9e834b637007f2e] 325 kB at 1.14 MB/s completed in < 1 minute
[https://linker.bio/hash:...36ae8bf631159f235d2e6f8] 321 kB at 1.17 MB/s completed in < 1 minute
[https://linker.bio/hash:...84d98a54c1690aa75f8fe4a] 271 kB at 0.96 MB/s completed in < 1 minute
[https://linker.bio/hash:...5517ed500196533da046bc0] 278 kB at 1.08 MB/s completed in < 1 minute
[https://linker.bio/hash:...e0bbc6bf9e9d45a7993ed3c] 312 kB at 1.15 MB/s completed in < 1 minute
[https://linker.bio/hash:...c31b024d03d4d971ae6a294] 338 kB at 1.19 MB/s completed in < 1 minute
[https://linker.bio/hash:...c0aeed571bfa1629419f4a6] 296 kB at 1.09 MB/s completed in < 1 minute
[https://linker.bio/hash:...e44006e29a524da7f2ac780] 287 kB at 1.05 MB/s completed in < 1 minute
[https://linker.bio/hash:...183bb1a0527e0a7418ed88b] 309 kB at 1.28 MB/s completed in < 1 minute
[https://linker.bio/hash:...8b50a4c99e3131d7a12ffb2] 288 kB at 0.94 MB/s completed in < 1 minute
[https://linker.bio/hash:...3142fd7301b7df3787d987f] 338 kB at 1.25 MB/s completed in < 1 minute
[https://linker.bio/hash:...9a7e1f74ba30b331acda2b7] 319 kB at 1.19 MB/s completed in < 1 minute
[https://linker.bio/hash:...4b67e315096efb3514f915c] 308 kB at 1.00 MB/s completed in < 1 minute
[https://linker.bio/hash:...e4febd1540c7a80f442de1f] 277 kB at 0.98 MB/s completed in < 1 minute
[https://linker.bio/hash:...10ead25896302479943a563] 277 kB at 1.08 MB/s completed in < 1 minute
[https://linker.bio/hash:...6b598329a344226c718e7d3] 270 kB at 0.95 MB/s completed in < 1 minute
[https://linker.bio/hash:...cb801def8cc43a38f911e6e] 287 kB at 1.05 MB/s completed in < 1 minute
[https://linker.bio/hash:...47956715168591969976177] 259 kB at 0.92 MB/s completed in < 1 minute
[https://linker.bio/hash:...ccab4c249f83391f433b9c5] 309 kB at 1.09 MB/s completed in < 1 minute
[https://linker.bio/hash:...620d7629fef023d31cfbae1] 318 kB at 1.13 MB/s completed in < 1 minute
[https://linker.bio/hash:...d2da567c661b10c05f1f81b] 318 kB at 1.56 MB/s completed in < 1 minute
[https://linker.bio/hash:...b48743a774635dbbb6cab95] 332 kB at 1.19 MB/s completed in < 1 minute
[https://linker.bio/hash:...e4fa0efdaf1fcb769e2279b] 308 kB at 1.12 MB/s completed in < 1 minute
[https://linker.bio/hash:...784c46acc3bdde6b0f35262] 310 kB at 1.10 MB/s completed in < 1 minute
[https://linker.bio/hash:...f49bed32cfd9461ba7eb777] 332 kB at 1.06 MB/s completed in < 1 minute
[https://linker.bio/hash:...94bf54395d19aa9425fc415] 311 kB at 1.20 MB/s completed in < 1 minute
[https://linker.bio/hash:...75d5fa592bc746bc52ae651] 328 kB at 1.18 MB/s completed in < 1 minute
[https://linker.bio/hash:...fdac46fe243be9bc89aed2a] 337 kB at 1.19 MB/s completed in < 1 minute
[https://linker.bio/hash:...76640277f965d3f78b473c4] 252 kB at 0.93 MB/s completed in < 1 minute
[https://linker.bio/hash:...339b9c686a95d444d2fc42a] 269 kB at 1.00 MB/s completed in < 1 minute
[https://linker.bio/hash:...0c0012063109c5981b8a7c4] 277 kB at 1.05 MB/s completed in < 1 minute
[https://linker.bio/hash:...099a856bd856636d383b4b2] 285 kB at 1.07 MB/s completed in < 1 minute
[https://linker.bio/hash:...67e9f3b6797b75c32465b95] 280 kB at 1.22 MB/s completed in < 1 minute
[https://linker.bio/hash:...133136cbef7f44920afb748] 272 kB at 0.99 MB/s completed in < 1 minute
[https://linker.bio/hash:...bf6bc18b7a930bdccd3f846] 329 kB at 1.17 MB/s completed in < 1 minute
[https://linker.bio/hash:...05b46ededda6b70b6e97763] 313 kB at 1.13 MB/s completed in < 1 minute
[https://linker.bio/hash:...cb3d26a5bb8c884058c2e4c] 289 kB at 1.02 MB/s completed in < 1 minute
[https://linker.bio/hash:...902a15493cac81a3d331a52] 259 kB at 0.95 MB/s completed in < 1 minute
[https://linker.bio/hash:...3b5a6419abeed59bf9e4291] 251 kB at 1.05 MB/s completed in < 1 minute
[https://linker.bio/hash:...c992a33fc19a43a35d6afae] 254 kB at 0.93 MB/s completed in < 1 minute
[https://linker.bio/hash:...b24476b938b21f3725edf8c] 290 kB at 1.03 MB/s completed in < 1 minute
[https://linker.bio/hash:...cab085d11bc0c6e9afeab7a] 267 kB at 0.98 MB/s completed in < 1 minute
[https://linker.bio/hash:...79bb87f9bc3eae385d895ca] 301 kB at 1.10 MB/s completed in < 1 minute
[https://linker.bio/hash:...bed066d0824c95f47db5ac6] 288 kB at 1.01 MB/s completed in < 1 minute
[https://linker.bio/hash:...60937273b64b7c8b1dd2c9c] 305 kB at 1.23 MB/s completed in < 1 minute
[https://linker.bio/hash:...cdb9f9ce397ddebb9a5ee6e] 269 kB at 1.29 MB/s completed in < 1 minute
[https://linker.bio/hash:...dccdee1ac35949120b98f08] 281 kB at 1.12 MB/s completed in < 1 minute
[https://linker.bio/hash:...83a8b95c14c82ae9883f23d] 271 kB at 0.90 MB/s completed in < 1 minute
[https://linker.bio/hash:...d7698a052bfc6ba01bbb68f] 295 kB at 0.98 MB/s completed in < 1 minute
@ariadnamorales I am glad you were able to reproduce the example that was included in https://batlit.org . And thank you for taking the time.
What would have made it easier for your past self to get to this point? Any suggestions on making it easier to run the examples?
(note that after running the example successfully, you should be able to run the same example with the internet turned off).
well I had not installed ´´´preston´´´and ´´´jq´´´. I had to install them, new user might face the same issue. Having a "prerequisites" section will help. Other than that, the example code is very clear. However the output folders are a bit cryptic. Not sure how to interpret them...
also, would it be possible to save the output of jq (json format) as a table. For users it will be easier to interpret. The README mentions a table, but I do not find the tsv or csv files in the files that were downloaded.
well I had not installed ´´´preston´´´and ´´´jq´´´. I had to install them, new user might face the same issue. Having a "prerequisites" section will help.
I've added a prerequisite section for your review at https://batlit.org#prerequisites .
Other than that, the example code is very clear. However the output folders are a bit cryptic. Not sure how to interpret them...
The output folders are not meant to be read by humans. Instead, they are more like the hidden .git
folders that come with cloned copies of Git repositories.
The preston
tool is meant to be used as a way to discover the content in that data archive using commands like preston ls
, preston cat
, and preston history
. Because this repository is expressed in md5 space, you'd have to add --algo md5
(sha256 is the default, but md5 is friendlier for Zenodo content).
You can also run preston
as a (local) server using -
preston s --algo md5
to make the content accessible via http (e.g., an internet browser) on port 8080. This is what is powering https://linker.bio and enables stuff like:
preston clone\
--algo md5\
--anchor hash://md5/26f7ce5dd404e33c6570edd4ba250d20\
http://localhost:8080/
open to any suggestions. . .
also, would it be possible to save the output of jq (json format) as a table. For users it will be easier to interpret. The README mentions a table, but I do not find the tsv or csv files in the files that were downloaded.
jq supports tsv / csv output as documented at https://jqlang.github.io/jq/manual/#format-strings-and-escaping .
And, I often use miller https://miller.readthedocs.io/en/6.12.0/ to do (streaming) table processing and conversions. Using miller, I created the tables included in the readme. Would you like me to link a downloadable version tsv/csv associated with the markdown table in the text?
btw - if you'd like, feel free to edit the README.md as you see fit and submit a pull request . If you don't like pull requests, let me know and we'll figure something else out.
Thanks for being patient in reviewing v0.5 of batlit.
See https://sandbox.zenodo.org/communities/batlit-review-md5-26f7ce5dd404e33c6570edd4ba250d20 for records I've generated from BatLit v0.5 https://linker.bio/hash://md5/26f7ce5dd404e33c6570edd4ba250d20 corpus. See also https://batlit.org
Please submit your review comments by Wednesday 28 Aug 2024.