bio-guoda / preston

a biodiversity dataset tracker
MIT License
24 stars 1 forks source link

add option to [grep]/[match] to select by line #109

Closed jhpoelen closed 3 years ago

jhpoelen commented 3 years ago

Currently, you can select parts of content that match a specific pattern:

e.g.,

$ preston ls | preston match [some pattern] ... cut:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/measurementorfact.txt!/b39887-40069

We'd like to add an option to match only the line number on which the pattern was found:

e.g.,

$ preston ls | preston match --line [some pattern] ... line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/measurementorfact.txt!/L23

where the example above expresses that a match was found on line 23.

jhpoelen commented 3 years ago

when using option "-o" in combination with --line, you get the cut notation also:

preston match [pattern]

line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/measurementorfact.txt!/L123

and

preston match -o [pattern]

cut:line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/measurementorfact.txt!/L123!/b12-23

and

preston match --no-line [pattern]

cut:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/measurementorfact.txt!/b12-23

PS compare with grep -o

man grep
... 
-o, --only-matching
              Print only the matched (non-empty) parts of a matching line, with each  such  part  on  a  separate
              output line.
mielliott commented 3 years ago

Using the preston-amazon dataset hash://sha256/1aa34112ade084ccc8707388fbc329dcb8fae5f895cb266e3ad943f7495740b3

$ preston history | tail -n1
<hash://sha256/1aa34112ade084ccc8707388fbc329dcb8fae5f895cb266e3ad943f7495740b3> <http://purl.org/pav/previousVersion> <hash://sha256/d7b73e3472d5a1989598f2a46116a4fc11dfb9ceacdf0a2b2f7f69737883c951> .

Default to reporting full lines that contain matches:

$ preston ls | preston match | head -n4
<urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> .
<urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> <http://www.w3.org/ns/prov#used> <hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5> <urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> .
<urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> <http://purl.org/dc/terms/description> "An activity that finds the locations of text matching the regular expression '(?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' inside any encountered content (e.g., hash://sha256/... identifiers)."@en <urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> .
<line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1> <http://www.w3.org/ns/prov#value> "[{\"key\":\"82ceb6ba-f762-11e1-a439-00145eb45e9a\",\"title\":\"Andes to Amazon Biodiversity Program\",\"description\":\"The Andes to Amazon Program is an international, multidisciplinary team of scientists, students, and Peruvian locals working between the Botanical Research Institute of Texas (BRIT) and selected field and museum sites in Peru.\",\"type\":\"OCCURRENCE\"},{\"key\":\"58414378-4fb2-47e0-8dd5-8b55d5c77117\",\"title\":\"Bolivian Amazon lowland fish metacommunity data\",\"description\":\"<p>This dataset represents data from the paper Yukoni, T. and Torres L. V. (2016) Fish metacommunity dynamics in the patchy heterogeneous habitats of varzea lakes, turbid river channels and transparent clear and black water bodies in the Amazonian Lowlands of Bolivia. Environmental Biology of Fishes.</p>\\n<p>This study documents the spatial dynamic of fish metacommunity based on the date sets of 65 sites, covering two geographic patches of transparent water valleys; Manuripi and Itenez Rivers, separated by turbid water valleys originate in the Andes and the Savanna.</p>\\n<p>See http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105 for additional metadata.</p>\",\"type\":\"OCCURRENCE\"},{\"key\":\"5a607ce6-eaaf-4420-a302-54ddc767130c\",\"title\":\"Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2011): Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia. Zootaxa 3201: 27-44, DOI: 10.5281/zenodo.202378\",\"type\":\"CHECKLIST\"},{\"key\":\"4716951d-11f5-4f31-bb1d-d40c95268ad3\",\"title\":\"Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Curletti, Gianfranco, Dutto, Angelo (2017): Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae). Zootaxa 4243 (2): 373-376, DOI: https://doi.org/10.11646/zootaxa.4243.2.7\",\"type\":\"CHECKLIST\"},{\"key\":\"10b6b053-2ccf-4ca0-8922-486ad098fc56\",\"title\":\"Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Silva, Valeria Juliete Da, Santos, Cleverson Rannieri Meira Dos, Fernandes, Jose Antonio Marin (2018): Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records. Zootaxa 4425 (3): 401-455, DOI: https://doi.org/10.11646/zootaxa.4425.3.1\",\"type\":\"CHECKLIST\"},{\"key\":\"6b43aef0-e62d-478f-9a88-b692578f0a73\",\"title\":\"New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Lanes, G. O., Azevedo, C. O. (2004): New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon. Zootaxa 679: 1-16, DOI: 10.5281/zenodo.158458\",\"type\":\"CHECKLIST\"},{\"key\":\"9f2ce723-8926-4f4c-b9f6-a4b596c8c1a9\",\"title\":\"Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Mendonça, Mirian Nascimento, Rafael, José Albertino, Ale-Rocha, Rosaly (2008): Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae). Zootaxa 1859: 1-39, DOI: 10.5281/zenodo.183631\",\"type\":\"CHECKLIST\"},{\"key\":\"f0096116-c79f-41cd-8376-35587cbe9fcd\",\"title\":\"New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2012): New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia. Zootaxa 3458: 103-119, DOI: 10.5281/zenodo.214602\",\"type\":\"CHECKLIST\"},{\"key\":\"0d875ec4-2366-4592-803d-cf6c61de8df4\",\"title\":\"A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Lima, Albertina P., Menin, Marcelo, Araújo, Maria Carmozina De (2007): A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon. Zootaxa 1663: 1-15, DOI: 10.5281/zenodo.179996\",\"type\":\"CHECKLIST\"},{\"key\":\"92b19f16-a3d8-4924-89ff-1800c1d048d0\",\"title\":\"Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Ott, Ricardo, Ruiz, Gustavo R. S., Brescovit, Antonio D., Bonaldo, Alexandre B. (2017): Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon. Zootaxa 4236 (2): 244-268, DOI: https://doi.org/10.11646/zootaxa.4236.2.2\",\"type\":\"CHECKLIST\"},{\"key\":\"e24ad3dc-e2fe-44a9-983c-d7fb90147e9f\",\"title\":\"Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Belmont, Enide Luciana L., Salles, Frederico F., Hamada, Neusa (2011): Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil. Zootaxa 3047: 43-53, DOI: 10.5281/zenodo.201430\",\"type\":\"CHECKLIST\"},{\"key\":\"9d215804-dcb2-49ba-82d5-cc927abfb384\",\"title\":\"Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Kogan, Marcos (2012): Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin. Zootaxa 3517: 79-87, DOI: 10.5281/zenodo.282617\",\"type\":\"CHECKLIST\"},{\"key\":\"b4683510-1fed-4ad4-bef6-64606e847fe9\",\"title\":\"A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Previattelli, Daniel, Santos-Silva, Edinaldo Nelson Dos (2007): A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon. Zootaxa 1518: 1-29, DOI: 10.5281/zenodo.177358\",\"type\":\"CHECKLIST\"},{\"key\":\"d0e838e4-82b0-440b-b876-b62afe416245\",\"title\":\"A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Toledo, Luís Felipe (2010): A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon. Zootaxa 2496: 63-68, DOI: 10.5281/zenodo.195714\",\"type\":\"CHECKLIST\"},{\"key\":\"b3262114-8580-4828-973d-b743d1034d00\",\"title\":\"A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Câmara, J. T., Rafael, J. A. (2013): A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon. Zootaxa 3669 (2): 147-152, DOI: http://dx.doi.org/10.11646/zootaxa.3669.2.5\",\"type\":\"CHECKLIST\"},{\"key\":\"a1d56c0b-d41f-42c1-91d2-417ab340969f\",\"title\":\"A new species of Besleria (Gesneriaceae) from the western Amazon rainforest\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Gabriel Emiliano Ferreira, Andréa Onofre De Araújo, Michael John Gilbert Hopkins, Alain Chautems (2017): A new species of Besleria (Gesneriaceae) from the western Amazon rainforest. Brittonia 69 (2): 241-245, DOI: 10.1007/s12228-017-9464-6\",\"type\":\"CHECKLIST\"},{\"key\":\"1edd9658-1b17-4794-bbf0-e2eccb017fd3\",\"title\":\"Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Santos, Geraldo Mendes Dos, Zuanon, Jansen (2008): Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes). Zootaxa 1815: 35-42, DOI: 10.5281/zenodo.182896\",\"type\":\"CHECKLIST\"},{\"key\":\"bbe844d6-89b6-4c9b-9cd1-4b7a8c332dc5\",\"title\":\"A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Ribeiro, Rannyele Passos, Alves, Paulo Ricardo, Almeida, Zafira da Silva de, Ruta, Christine (2018): A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae). ZooKeys 740: 1-34, DOI: http://dx.doi.org/10.3897/zookeys.740.14640, URL: http://dx.doi.org/10.3897/zookeys.740.14640\",\"type\":\"CHECKLIST\"},{\"key\":\"6e68c0bd-4d98-4a19-a066-e610c60b9478\",\"title\":\"Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Demite, Peterson R., Cruz, Wilton P., Moraes, Gilberto J. (2017): Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest. Zootaxa 4236 (2): 302-310, DOI: https://doi.org/10.11646/zootaxa.4236.2.5\",\"type\":\"CHECKLIST\"},{\"key\":\"663199f1-3528-4289-8069-d27552f62f10\",\"title\":\"A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Coelho, Beatriz W., Aguiar, Alexandre P., Engel, Michael S. (2011): A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species. Zootaxa 2907: 1-21, DOI: 10.5281/zenodo.201416\",\"type\":\"CHECKLIST\"}]" <urn:uuid:0a460ac1-2e23-4a71-96a0-39448b404ea4> .

With -o:

$ preston ls | preston match -o | head -n4
<urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> .
<urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> <http://www.w3.org/ns/prov#used> <hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5> <urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> .
<urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> <http://purl.org/dc/terms/description> "An activity that finds the locations of text matching the regular expression '(?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' inside any encountered content (e.g., hash://sha256/... identifiers)."@en <urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> .
<cut:line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1!/b1063-1137> <http://www.w3.org/ns/prov#value> "http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105" <urn:uuid:268b5c87-c4b2-4f87-8c6d-acb394f6fc5b> .

With -o --no-line to use the original behavior:

$ preston ls | preston match -o --no-line | head -n4
<urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> .
<urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> <http://www.w3.org/ns/prov#used> <hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5> <urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> .
<urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> <http://purl.org/dc/terms/description> "An activity that finds the locations of text matching the regular expression '(?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' inside any encountered content (e.g., hash://sha256/... identifiers)."@en <urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> .
<cut:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/b1063-1137> <http://www.w3.org/ns/prov#value> "http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105" <urn:uuid:cc37819d-0521-496f-8919-689aa5453f29> .

@jhpoelen

jhpoelen commented 3 years ago

@mielliott I was able to reproduce your newly added -o feature using:

$ preston ls | preston match | head -n4 
<urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> .
<urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> <http://www.w3.org/ns/prov#used> <hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5> <urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> .
<urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> <http://purl.org/dc/terms/description> "An activity that finds the locations of text matching the regular expression '(?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' inside any encountered content (e.g., hash://sha256/... identifiers)."@en <urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> .
<line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1> <http://www.w3.org/ns/prov#value> "[{\"key\":\"82ceb6ba-f762-11e1-a439-00145eb45e9a\",\"title\":\"Andes to Amazon Biodiversity Program\",\"description\":\"The Andes to Amazon Program is an international, multidisciplinary team of scientists, students, and Peruvian locals working between the Botanical Research Institute of Texas (BRIT) and selected field and museum sites in Peru.\",\"type\":\"OCCURRENCE\"},{\"key\":\"58414378-4fb2-47e0-8dd5-8b55d5c77117\",\"title\":\"Bolivian Amazon lowland fish metacommunity data\",\"description\":\"<p>This dataset represents data from the paper Yukoni, T. and Torres L. V. (2016) Fish metacommunity dynamics in the patchy heterogeneous habitats of varzea lakes, turbid river channels and transparent clear and black water bodies in the Amazonian Lowlands of Bolivia. Environmental Biology of Fishes.</p>\\n<p>This study documents the spatial dynamic of fish metacommunity based on the date sets of 65 sites, covering two geographic patches of transparent water valleys; Manuripi and Itenez Rivers, separated by turbid water valleys originate in the Andes and the Savanna.</p>\\n<p>See http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105 for additional metadata.</p>\",\"type\":\"OCCURRENCE\"},{\"key\":\"5a607ce6-eaaf-4420-a302-54ddc767130c\",\"title\":\"Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2011): Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia. Zootaxa 3201: 27-44, DOI: 10.5281/zenodo.202378\",\"type\":\"CHECKLIST\"},{\"key\":\"4716951d-11f5-4f31-bb1d-d40c95268ad3\",\"title\":\"Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Curletti, Gianfranco, Dutto, Angelo (2017): Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae). Zootaxa 4243 (2): 373-376, DOI: https://doi.org/10.11646/zootaxa.4243.2.7\",\"type\":\"CHECKLIST\"},{\"key\":\"10b6b053-2ccf-4ca0-8922-486ad098fc56\",\"title\":\"Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Silva, Valeria Juliete Da, Santos, Cleverson Rannieri Meira Dos, Fernandes, Jose Antonio Marin (2018): Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records. Zootaxa 4425 (3): 401-455, DOI: https://doi.org/10.11646/zootaxa.4425.3.1\",\"type\":\"CHECKLIST\"},{\"key\":\"6b43aef0-e62d-478f-9a88-b692578f0a73\",\"title\":\"New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Lanes, G. O., Azevedo, C. O. (2004): New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon. Zootaxa 679: 1-16, DOI: 10.5281/zenodo.158458\",\"type\":\"CHECKLIST\"},{\"key\":\"9f2ce723-8926-4f4c-b9f6-a4b596c8c1a9\",\"title\":\"Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Mendonça, Mirian Nascimento, Rafael, José Albertino, Ale-Rocha, Rosaly (2008): Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae). Zootaxa 1859: 1-39, DOI: 10.5281/zenodo.183631\",\"type\":\"CHECKLIST\"},{\"key\":\"f0096116-c79f-41cd-8376-35587cbe9fcd\",\"title\":\"New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2012): New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia. Zootaxa 3458: 103-119, DOI: 10.5281/zenodo.214602\",\"type\":\"CHECKLIST\"},{\"key\":\"0d875ec4-2366-4592-803d-cf6c61de8df4\",\"title\":\"A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Lima, Albertina P., Menin, Marcelo, Araújo, Maria Carmozina De (2007): A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon. Zootaxa 1663: 1-15, DOI: 10.5281/zenodo.179996\",\"type\":\"CHECKLIST\"},{\"key\":\"92b19f16-a3d8-4924-89ff-1800c1d048d0\",\"title\":\"Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Ott, Ricardo, Ruiz, Gustavo R. S., Brescovit, Antonio D., Bonaldo, Alexandre B. (2017): Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon. Zootaxa 4236 (2): 244-268, DOI: https://doi.org/10.11646/zootaxa.4236.2.2\",\"type\":\"CHECKLIST\"},{\"key\":\"e24ad3dc-e2fe-44a9-983c-d7fb90147e9f\",\"title\":\"Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Belmont, Enide Luciana L., Salles, Frederico F., Hamada, Neusa (2011): Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil. Zootaxa 3047: 43-53, DOI: 10.5281/zenodo.201430\",\"type\":\"CHECKLIST\"},{\"key\":\"9d215804-dcb2-49ba-82d5-cc927abfb384\",\"title\":\"Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Kogan, Marcos (2012): Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin. Zootaxa 3517: 79-87, DOI: 10.5281/zenodo.282617\",\"type\":\"CHECKLIST\"},{\"key\":\"b4683510-1fed-4ad4-bef6-64606e847fe9\",\"title\":\"A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Previattelli, Daniel, Santos-Silva, Edinaldo Nelson Dos (2007): A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon. Zootaxa 1518: 1-29, DOI: 10.5281/zenodo.177358\",\"type\":\"CHECKLIST\"},{\"key\":\"d0e838e4-82b0-440b-b876-b62afe416245\",\"title\":\"A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Toledo, Luís Felipe (2010): A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon. Zootaxa 2496: 63-68, DOI: 10.5281/zenodo.195714\",\"type\":\"CHECKLIST\"},{\"key\":\"b3262114-8580-4828-973d-b743d1034d00\",\"title\":\"A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Câmara, J. T., Rafael, J. A. (2013): A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon. Zootaxa 3669 (2): 147-152, DOI: http://dx.doi.org/10.11646/zootaxa.3669.2.5\",\"type\":\"CHECKLIST\"},{\"key\":\"a1d56c0b-d41f-42c1-91d2-417ab340969f\",\"title\":\"A new species of Besleria (Gesneriaceae) from the western Amazon rainforest\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Gabriel Emiliano Ferreira, Andréa Onofre De Araújo, Michael John Gilbert Hopkins, Alain Chautems (2017): A new species of Besleria (Gesneriaceae) from the western Amazon rainforest. Brittonia 69 (2): 241-245, DOI: 10.1007/s12228-017-9464-6\",\"type\":\"CHECKLIST\"},{\"key\":\"1edd9658-1b17-4794-bbf0-e2eccb017fd3\",\"title\":\"Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Santos, Geraldo Mendes Dos, Zuanon, Jansen (2008): Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes). Zootaxa 1815: 35-42, DOI: 10.5281/zenodo.182896\",\"type\":\"CHECKLIST\"},{\"key\":\"bbe844d6-89b6-4c9b-9cd1-4b7a8c332dc5\",\"title\":\"A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae)\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Ribeiro, Rannyele Passos, Alves, Paulo Ricardo, Almeida, Zafira da Silva de, Ruta, Christine (2018): A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae). ZooKeys 740: 1-34, DOI: http://dx.doi.org/10.3897/zookeys.740.14640, URL: http://dx.doi.org/10.3897/zookeys.740.14640\",\"type\":\"CHECKLIST\"},{\"key\":\"6e68c0bd-4d98-4a19-a066-e610c60b9478\",\"title\":\"Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Demite, Peterson R., Cruz, Wilton P., Moraes, Gilberto J. (2017): Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest. Zootaxa 4236 (2): 302-310, DOI: https://doi.org/10.11646/zootaxa.4236.2.5\",\"type\":\"CHECKLIST\"},{\"key\":\"663199f1-3528-4289-8069-d27552f62f10\",\"title\":\"A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species\",\"description\":\"This dataset contains the digitized treatments in Plazi based on the original journal article Coelho, Beatriz W., Aguiar, Alexandre P., Engel, Michael S. (2011): A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species. Zootaxa 2907: 1-21, DOI: 10.5281/zenodo.201416\",\"type\":\"CHECKLIST\"}]" <urn:uuid:e7a921d4-6f1f-44b6-bf7d-964614e6d233> .

and with -o

$ preston ls | preston match -o | head -n4
<urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> .
<urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> <http://www.w3.org/ns/prov#used> <hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5> <urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> .
<urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> <http://purl.org/dc/terms/description> "An activity that finds the locations of text matching the regular expression '(?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' inside any encountered content (e.g., hash://sha256/... identifiers)."@en <urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> .
<cut:line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1!/b1063-1137> <http://www.w3.org/ns/prov#value> "http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105" <urn:uuid:d35c7bbb-2e7e-4002-8cee-a73bb387828d> .

and with -o --no-line

$ preston ls | preston match -o --no-line | head -n4
<urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> <urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> .
<urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> <http://www.w3.org/ns/prov#used> <hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5> <urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> .
<urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> <http://purl.org/dc/terms/description> "An activity that finds the locations of text matching the regular expression '(?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]' inside any encountered content (e.g., hash://sha256/... identifiers)."@en <urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> .
<cut:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/b1063-1137> <http://www.w3.org/ns/prov#value> "http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105" <urn:uuid:f8c5fcb3-acbd-4233-aa66-40cada5cd3c5> .
jhpoelen commented 3 years ago

I am pretty excited about your new feature, and have a way to point to specific lines in an archive / file.

I was wondering about two things:

  1. I was unable to resist the urge to type:

preston cat 'line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1'

and found preston complaining about the following:

java.io.IOException: problem retrieving [line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1]
    at bio.guoda.preston.cmd.CmdGet.handleContentQuery(CmdGet.java:61)
    at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:45)
    at bio.guoda.preston.cmd.CmdGet.run(CmdGet.java:32)
    at bio.guoda.preston.cmd.CmdLine.run(CmdLine.java:18)
    at bio.guoda.preston.cmd.CmdLine.run(CmdLine.java:26)
    at bio.guoda.preston.Preston.main(Preston.java:19)
Caused by: bio.guoda.preston.store.DereferenceException: failed to dereference [line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1]
    at bio.guoda.preston.store.ContentHashDereferencer.dereference(ContentHashDereferencer.java:26)
    at bio.guoda.preston.cmd.CmdGet.handleContentQuery(CmdGet.java:58)
    ... 5 more
Caused by: java.io.IOException: failed to find content identified by [<line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1>]
    at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:26)
    at bio.guoda.preston.store.ContentHashDereferencer.dereference(ContentHashDereferencer.java:24)
    ... 6 more
  1. Also, at a first glance I interpreted the cut:line: notation :

cut:line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1!/b1063-1137

as: select the characters in range 1063-1137 on line 1 of hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5 ?

But then I noticed that the --no-line had the same byte range, which makes sense considering that it is the first line.

However, when running:

$ preston ls | preston match -o | head | tail -n1
<cut:line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1!/b8910-8952> <http://www.w3.org/ns/prov#value> "http://dx.doi.org/10.3897/zookeys.740.14640" <urn:uuid:24d5a193-1bdb-415f-a826-96f382a8691a> .

the same byte range was produced using

$ preston ls | preston match -o --no-line | head | tail -n1
<cut:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/b8910-8952> <http://www.w3.org/ns/prov#value> "http://dx.doi.org/10.3897/zookeys.740.14640" <urn:uuid:89589bd2-7322-4234-8574-fbd40be1c944> .

, which seems a bit counter intuitive because I was expecting the byte offset with the line selection to be counted from the start of the selected line.

Curious to hear your comments on the above!

mielliott commented 3 years ago

Aw rats, that’s some awful stuff! Thanks for trying it out though - I’ll have a look at it later tonight

jhpoelen commented 3 years ago

The feature is pretty awesome . . . my notes are just details I am curious to hear your thoughts on.

mielliott commented 3 years ago

preston cat 'line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1'

This should now work:

$ preston cat 'line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1'
[{"key":"82ceb6ba-f762-11e1-a439-00145eb45e9a","title":"Andes to Amazon Biodiversity Program","description":"The Andes to Amazon Program is an international, multidisciplinary team of scientists, students, and Peruvian locals working between the Botanical Research Institute of Texas (BRIT) and selected field and museum sites in Peru.","type":"OCCURRENCE"},{"key":"58414378-4fb2-47e0-8dd5-8b55d5c77117","title":"Bolivian Amazon lowland fish metacommunity data","description":"<p>This dataset represents data from the paper Yukoni, T. and Torres L. V. (2016) Fish metacommunity dynamics in the patchy heterogeneous habitats of varzea lakes, turbid river channels and transparent clear and black water bodies in the Amazonian Lowlands of Bolivia. Environmental Biology of Fishes.</p>\n<p>This study documents the spatial dynamic of fish metacommunity based on the date sets of 65 sites, covering two geographic patches of transparent water valleys; Manuripi and Itenez Rivers, separated by turbid water valleys originate in the Andes and the Savanna.</p>\n<p>See http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105 for additional metadata.</p>","type":"OCCURRENCE"},{"key":"5a607ce6-eaaf-4420-a302-54ddc767130c","title":"Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia","description":"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2011): Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia. Zootaxa 3201: 27-44, DOI: 10.5281/zenodo.202378","type":"CHECKLIST"},{"key":"4716951d-11f5-4f31-bb1d-d40c95268ad3","title":"Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Curletti, Gianfranco, Dutto, Angelo (2017): Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae). Zootaxa 4243 (2): 373-376, DOI: https://doi.org/10.11646/zootaxa.4243.2.7","type":"CHECKLIST"},{"key":"10b6b053-2ccf-4ca0-8922-486ad098fc56","title":"Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Silva, Valeria Juliete Da, Santos, Cleverson Rannieri Meira Dos, Fernandes, Jose Antonio Marin (2018): Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records. Zootaxa 4425 (3): 401-455, DOI: https://doi.org/10.11646/zootaxa.4425.3.1","type":"CHECKLIST"},{"key":"6b43aef0-e62d-478f-9a88-b692578f0a73","title":"New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Lanes, G. O., Azevedo, C. O. (2004): New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon. Zootaxa 679: 1-16, DOI: 10.5281/zenodo.158458","type":"CHECKLIST"},{"key":"9f2ce723-8926-4f4c-b9f6-a4b596c8c1a9","title":"Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Mendonça, Mirian Nascimento, Rafael, José Albertino, Ale-Rocha, Rosaly (2008): Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae). Zootaxa 1859: 1-39, DOI: 10.5281/zenodo.183631","type":"CHECKLIST"},{"key":"f0096116-c79f-41cd-8376-35587cbe9fcd","title":"New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia","description":"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2012): New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia. Zootaxa 3458: 103-119, DOI: 10.5281/zenodo.214602","type":"CHECKLIST"},{"key":"0d875ec4-2366-4592-803d-cf6c61de8df4","title":"A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Lima, Albertina P., Menin, Marcelo, Araújo, Maria Carmozina De (2007): A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon. Zootaxa 1663: 1-15, DOI: 10.5281/zenodo.179996","type":"CHECKLIST"},{"key":"92b19f16-a3d8-4924-89ff-1800c1d048d0","title":"Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Ott, Ricardo, Ruiz, Gustavo R. S., Brescovit, Antonio D., Bonaldo, Alexandre B. (2017): Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon. Zootaxa 4236 (2): 244-268, DOI: https://doi.org/10.11646/zootaxa.4236.2.2","type":"CHECKLIST"},{"key":"e24ad3dc-e2fe-44a9-983c-d7fb90147e9f","title":"Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Belmont, Enide Luciana L., Salles, Frederico F., Hamada, Neusa (2011): Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil. Zootaxa 3047: 43-53, DOI: 10.5281/zenodo.201430","type":"CHECKLIST"},{"key":"9d215804-dcb2-49ba-82d5-cc927abfb384","title":"Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Kogan, Marcos (2012): Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin. Zootaxa 3517: 79-87, DOI: 10.5281/zenodo.282617","type":"CHECKLIST"},{"key":"b4683510-1fed-4ad4-bef6-64606e847fe9","title":"A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Previattelli, Daniel, Santos-Silva, Edinaldo Nelson Dos (2007): A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon. Zootaxa 1518: 1-29, DOI: 10.5281/zenodo.177358","type":"CHECKLIST"},{"key":"d0e838e4-82b0-440b-b876-b62afe416245","title":"A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Toledo, Luís Felipe (2010): A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon. Zootaxa 2496: 63-68, DOI: 10.5281/zenodo.195714","type":"CHECKLIST"},{"key":"b3262114-8580-4828-973d-b743d1034d00","title":"A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Câmara, J. T., Rafael, J. A. (2013): A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon. Zootaxa 3669 (2): 147-152, DOI: http://dx.doi.org/10.11646/zootaxa.3669.2.5","type":"CHECKLIST"},{"key":"a1d56c0b-d41f-42c1-91d2-417ab340969f","title":"A new species of Besleria (Gesneriaceae) from the western Amazon rainforest","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Gabriel Emiliano Ferreira, Andréa Onofre De Araújo, Michael John Gilbert Hopkins, Alain Chautems (2017): A new species of Besleria (Gesneriaceae) from the western Amazon rainforest. Brittonia 69 (2): 241-245, DOI: 10.1007/s12228-017-9464-6","type":"CHECKLIST"},{"key":"1edd9658-1b17-4794-bbf0-e2eccb017fd3","title":"Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Santos, Geraldo Mendes Dos, Zuanon, Jansen (2008): Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes). Zootaxa 1815: 35-42, DOI: 10.5281/zenodo.182896","type":"CHECKLIST"},{"key":"bbe844d6-89b6-4c9b-9cd1-4b7a8c332dc5","title":"A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Ribeiro, Rannyele Passos, Alves, Paulo Ricardo, Almeida, Zafira da Silva de, Ruta, Christine (2018): A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae). ZooKeys 740: 1-34, DOI: http://dx.doi.org/10.3897/zookeys.740.14640, URL: http://dx.doi.org/10.3897/zookeys.740.14640","type":"CHECKLIST"},{"key":"6e68c0bd-4d98-4a19-a066-e610c60b9478","title":"Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Demite, Peterson R., Cruz, Wilton P., Moraes, Gilberto J. (2017): Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest. Zootaxa 4236 (2): 302-310, DOI: https://doi.org/10.11646/zootaxa.4236.2.5","type":"CHECKLIST"},{"key":"663199f1-3528-4289-8069-d27552f62f10","title":"A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Coelho, Beatriz W., Aguiar, Alexandre P., Engel, Michael S. (2011): A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species. Zootaxa 2907: 1-21, DOI: 10.5281/zenodo.201416","type":"CHECKLIST"}]

Notice that the file actually is just one big line. No line breaks. So the good news is, the line number and byte ranges actually are working. Looking at another result from matching on the amazon dataset:

$ preston ls | preston match -o | head -n60 | tail -n1
<cut:line:hash://sha256/7d73d2374efed4a5144a0051b457d98279b29453bb81b5a5b87da2ccc12391bc!/L1!/b1666-1681> <http://www.w3.org/ns/prov#value> "http://plazi.org" <urn:uuid:37c45040-20b3-4166-9c1b-71dca9e03421> .

Then cat it back (fixed in https://github.com/bio-guoda/preston/commit/89ee74a218daff6393d9e2cff7570ac93a3874a8):

$ preston cat 'cut:line:hash://sha256/7d73d2374efed4a5144a0051b457d98279b29453bb81b5a5b87da2ccc12391bc!/L1!/b1666-1681'
http://plazi.org

Voila!

Edit: oops, the new example I dug up was also using line 1. Hold on a sec

mielliott commented 3 years ago

Attempt number 2:

$ preston ls | preston match -o | head -n238 | tail -n1
<cut:line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/occurrence.txt!/L5!/b188-218> <http://www.w3.org/ns/prov#value> "http://www.canadensys.net/norms" <urn:uuid:6162de6e-c0e9-48a1-9bc9-cbf19d9195b7> .

$ preston cat 'cut:line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/occurrence.txt!/L5!/b188-218'
http://www.canadensys.net/norms
jhpoelen commented 3 years ago

Wow! I was just able to do:

$ preston cat 'line:hash://sha256/6d86c332b045e74fe4410f79655a1f47596808c057f30779b9584dba38fa25d5!/L1'
[{"key":"82ceb6ba-f762-11e1-a439-00145eb45e9a","title":"Andes to Amazon Biodiversity Program","description":"The Andes to Amazon Program is an international, multidisciplinary team of scientists, students, and Peruvian locals working between the Botanical Research Institute of Texas (BRIT) and selected field and museum sites in Peru.","type":"OCCURRENCE"},{"key":"58414378-4fb2-47e0-8dd5-8b55d5c77117","title":"Bolivian Amazon lowland fish metacommunity data","description":"<p>This dataset represents data from the paper Yukoni, T. and Torres L. V. (2016) Fish metacommunity dynamics in the patchy heterogeneous habitats of varzea lakes, turbid river channels and transparent clear and black water bodies in the Amazonian Lowlands of Bolivia. Environmental Biology of Fishes.</p>\n<p>This study documents the spatial dynamic of fish metacommunity based on the date sets of 65 sites, covering two geographic patches of transparent water valleys; Manuripi and Itenez Rivers, separated by turbid water valleys originate in the Andes and the Savanna.</p>\n<p>See http://www.freshwaterbiodiversity.eu/metadb/bf_mdb_view.php?entryID=BFE_105 for additional metadata.</p>","type":"OCCURRENCE"},{"key":"5a607ce6-eaaf-4420-a302-54ddc767130c","title":"Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia","description":"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2011): Earthworms (Oligochaeta: Glossoscolecidae) of the Amazon region of Colombia. Zootaxa 3201: 27-44, DOI: 10.5281/zenodo.202378","type":"CHECKLIST"},{"key":"4716951d-11f5-4f31-bb1d-d40c95268ad3","title":"Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Curletti, Gianfranco, Dutto, Angelo (2017): Notes on the Agrilus fauna of the Colombian Amazon (Coleoptera, Buprestidae). Zootaxa 4243 (2): 373-376, DOI: https://doi.org/10.11646/zootaxa.4243.2.7","type":"CHECKLIST"},{"key":"10b6b053-2ccf-4ca0-8922-486ad098fc56","title":"Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Silva, Valeria Juliete Da, Santos, Cleverson Rannieri Meira Dos, Fernandes, Jose Antonio Marin (2018): Stink bugs (Hemiptera: Pentatomidae) from Brazilian Amazon: checklist and new records. Zootaxa 4425 (3): 401-455, DOI: https://doi.org/10.11646/zootaxa.4425.3.1","type":"CHECKLIST"},{"key":"6b43aef0-e62d-478f-9a88-b692578f0a73","title":"New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Lanes, G. O., Azevedo, C. O. (2004): New species and notes on Apenesia (Hymenoptera, Bethylidae) from the Brazilian Amazon. Zootaxa 679: 1-16, DOI: 10.5281/zenodo.158458","type":"CHECKLIST"},{"key":"9f2ce723-8926-4f4c-b9f6-a4b596c8c1a9","title":"Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Mendonça, Mirian Nascimento, Rafael, José Albertino, Ale-Rocha, Rosaly (2008): Revision of the Brazilian Amazon Basin species of Porphyrochroa Melander (Diptera: Empididae). Zootaxa 1859: 1-39, DOI: 10.5281/zenodo.183631","type":"CHECKLIST"},{"key":"f0096116-c79f-41cd-8376-35587cbe9fcd","title":"New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia","description":"This dataset contains the digitized treatments in Plazi based on the original journal article M, Alexander Feijoo, Celis, Liliana V. (2012): New species of earthworms (Oligochaeta: Glossoscolecidae) in the Amazon region of Colombia. Zootaxa 3458: 103-119, DOI: 10.5281/zenodo.214602","type":"CHECKLIST"},{"key":"0d875ec4-2366-4592-803d-cf6c61de8df4","title":"A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Lima, Albertina P., Menin, Marcelo, Araújo, Maria Carmozina De (2007): A new species of Rhinella (Anura: Bufonidae) from Brazilian Amazon. Zootaxa 1663: 1-15, DOI: 10.5281/zenodo.179996","type":"CHECKLIST"},{"key":"92b19f16-a3d8-4924-89ff-1800c1d048d0","title":"Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Ott, Ricardo, Ruiz, Gustavo R. S., Brescovit, Antonio D., Bonaldo, Alexandre B. (2017): Amazoonops, a new genus of goblin spiders (Araneae: Oonopidae) from the Brazilian Amazon. Zootaxa 4236 (2): 244-268, DOI: https://doi.org/10.11646/zootaxa.4236.2.2","type":"CHECKLIST"},{"key":"e24ad3dc-e2fe-44a9-983c-d7fb90147e9f","title":"Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Belmont, Enide Luciana L., Salles, Frederico F., Hamada, Neusa (2011): Three new species of Leptohyphidae (Insecta: Ephemeroptera) from Central Amazon, Brazil. Zootaxa 3047: 43-53, DOI: 10.5281/zenodo.201430","type":"CHECKLIST"},{"key":"9d215804-dcb2-49ba-82d5-cc927abfb384","title":"Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Kogan, Marcos (2012): Two New Species of Halictophagidae (Insecta: Strepsiptera) from the Brazilian Amazon Basin. Zootaxa 3517: 79-87, DOI: 10.5281/zenodo.282617","type":"CHECKLIST"},{"key":"b4683510-1fed-4ad4-bef6-64606e847fe9","title":"A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Previattelli, Daniel, Santos-Silva, Edinaldo Nelson Dos (2007): A new Argyrodiaptomus (Copepoda: Calanoida: Diaptomidae) from the southwestern Brazilian Amazon. Zootaxa 1518: 1-29, DOI: 10.5281/zenodo.177358","type":"CHECKLIST"},{"key":"d0e838e4-82b0-440b-b876-b62afe416245","title":"A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Toledo, Luís Felipe (2010): A new species of Elachistocleis (Anura; Microhylidae) from the Brazilian Amazon. Zootaxa 2496: 63-68, DOI: 10.5281/zenodo.195714","type":"CHECKLIST"},{"key":"b3262114-8580-4828-973d-b743d1034d00","title":"A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Câmara, J. T., Rafael, J. A. (2013): A new species of Furciseta (Diptera, Ctenostylidae) from the Brazilian Amazon. Zootaxa 3669 (2): 147-152, DOI: http://dx.doi.org/10.11646/zootaxa.3669.2.5","type":"CHECKLIST"},{"key":"a1d56c0b-d41f-42c1-91d2-417ab340969f","title":"A new species of Besleria (Gesneriaceae) from the western Amazon rainforest","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Gabriel Emiliano Ferreira, Andréa Onofre De Araújo, Michael John Gilbert Hopkins, Alain Chautems (2017): A new species of Besleria (Gesneriaceae) from the western Amazon rainforest. Brittonia 69 (2): 241-245, DOI: 10.1007/s12228-017-9464-6","type":"CHECKLIST"},{"key":"1edd9658-1b17-4794-bbf0-e2eccb017fd3","title":"Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Santos, Geraldo Mendes Dos, Zuanon, Jansen (2008): Leporinus amazonicus, a new anostomid species from the Amazon lowlands, Brazil (Osteichthyes: Characiformes). Zootaxa 1815: 35-42, DOI: 10.5281/zenodo.182896","type":"CHECKLIST"},{"key":"bbe844d6-89b6-4c9b-9cd1-4b7a8c332dc5","title":"A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae)","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Ribeiro, Rannyele Passos, Alves, Paulo Ricardo, Almeida, Zafira da Silva de, Ruta, Christine (2018): A new species of Paraonis and an annotated checklist of polychaetes from mangroves of the Brazilian Amazon Coast (Annelida, Paraonidae). ZooKeys 740: 1-34, DOI: http://dx.doi.org/10.3897/zookeys.740.14640, URL: http://dx.doi.org/10.3897/zookeys.740.14640","type":"CHECKLIST"},{"key":"6e68c0bd-4d98-4a19-a066-e610c60b9478","title":"Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Demite, Peterson R., Cruz, Wilton P., Moraes, Gilberto J. (2017): Amazoniaseius imparisetosus n. sp., n. g.: an unusual new phytoseiid mite (Acari: Phytoseiidae) from the Amazon forest. Zootaxa 4236 (2): 302-310, DOI: https://doi.org/10.11646/zootaxa.4236.2.5","type":"CHECKLIST"},{"key":"663199f1-3528-4289-8069-d27552f62f10","title":"A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species","description":"This dataset contains the digitized treatments in Plazi based on the original journal article Coelho, Beatriz W., Aguiar, Alexandre P., Engel, Michael S. (2011): A survey of Dryinidae (Hymenoptera, Chrysidoidea) from Caxiuanã, Amazon Basin, with three new taxa and keys to genera and species. Zootaxa 2907: 1-21, DOI: 10.5281/zenodo.201416","type":"CHECKLIST"}]

Also, I was able to reproduce:

$ preston ls | preston match -o | head -n238 | tail -n1
<cut:line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/occurrence.txt!/L5!/b188-218> <http://www.w3.org/ns/prov#value> "http://www.canadensys.net/norms" <urn:uuid:8c68ff06-e7ed-44d1-8b13-dc747886007a> .

with inverse lookup:

$ preston cat 'cut:line:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/occurrence.txt!/L5!/b188-218'
http://www.canadensys.net/norms

also, without lines:

$ preston ls | preston match -o --no-lines | head -n238 | tail -n1
<cut:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/occurrence.txt!/b4094-4124> <http://www.w3.org/ns/prov#value> "http://www.canadensys.net/norms" <urn:uuid:bc71ed2e-a964-4e2d-bf37-8b2778e60429> .

where the byte range is counted from start of the content (i.e. cut:[...]!/b4094-4124) , instead of beginning of line 5 (i.e. cut:line:[...]!/L5!/b188-218) and both notations yield same result (e.g., http://www.canadensys.net/norms).

Very cool way to express coordinates in a predictable biodiversity data universe!

Because, no matter where you are or what you do, the following always holds:

$ preston cat 'cut:zip:hash://sha256/97cbeae429fbc95d1859f7afa28b33f08ac64125ba72511c49c4b77ca66d2d66!/occurrence.txt!/b4094-4124'
http://www.canadensys.net/norms

fyi @cboettig @seltmann @mjy @dshorthouse

jhpoelen commented 3 years ago

Thanks for making this happen @mielliott !

jhpoelen commented 3 years ago

fyi @zedomel

jhpoelen commented 3 years ago

@mielliott Just installed preston 0.3.0 and found that

https://deeplinker.bio/cat/line:zip:hash://sha256/29d30b566f924355a383b13cd48c3aa239d42cba0a55f4ccfc2930289b88b43c!/occurrence.txt!/L1

works like a charm (see attached screenshot) . Note that the hash is the (huge) ebird dataset

Screenshot from 2021-06-18 16-20-14

I had the urge to use a line range e.g., L1-2 . Is that something you had in mind too?

mielliott commented 3 years ago

Sweet! Opening the URL is surprisingly speedy too!

had the urge to use a line range e.g., L1-2 . Is that something you had in mind too?

Definitely; I didn't expect preston grep to report multi-line matches though, so catting line ranges didn't get implemented