Closed mielliott closed 4 months ago
Setting the provenance anchor to the the hash of the graph published in https://zenodo.org/record/3852671, I think I'm finding my way to the May log:
$ preston history --remote https://linker.bio -r hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b
[https://linker.bio/hash:...099212c8831db399fc0831b] 133 MB at 17.08 MB/s completed in < 1 minute
<hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> .
[https://linker.bio/hash:...dfc39694f2cb3a4c62a2b2d] 132 MB at 18.27 MB/s completed in < 1 minute
<hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> .
[https://linker.bio/hash:...7c39bdac3dd2011e9189dc1] 300 MB at 20.29 MB/s completed in < 1 minute
<hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> .
[https://linker.bio/hash:...73fb4fbe90405757bc07953] 101 MB at 17.20 MB/s completed in < 1 minute
<hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> .
[https://linker.bio/hash:...6dd61bf4aa3a829ac744e2c] 101 MB at 16.56 MB/s completed in < 1 minute
<hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> .
[https://linker.bio/hash:...6a7e0437f035aa9d22a3790] 93 MB at 16.78 MB/s completed in < 1 minute
<hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> .
[https://linker.bio/hash:...87a4f21568cc871b10ba39b] 93 MB at 18.91 MB/s completed in < 1 minute
<hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> .
[https://linker.bio/hash:...64c42ba72dfd346d5d3a547] 87 MB at 15.74 MB/s completed in < 1 minute
<hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> .
[https://linker.bio/hash:...68e9a9f10de169aa0a3dcb5] 89 MB at 13.77 MB/s completed in < 1 minute
<hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> .
[https://linker.bio/hash:...d2cc2d81977e898d885c908] 249 kB at 0.41 MB/s completed in < 1 minute
<hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> .
[https://linker.bio/hash:...eaaf0ffdef3e907a2f4309f] 554 kB at 0.76 MB/s completed in < 1 minute
<hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> <http://www.w3.org/ns/prov#wasDerivedFrom> <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> .
[https://linker.bio/hash:...dcb7a36038faf18027eebc1] 940 MB at 14.39 MB/s
Some of these logs are really big! ~I wonder why?~ These are old (pre 2019) logs, things were wonkier back then
[https://linker.bio/hash:...dcb7a36038faf18027eebc1] 1102 MB at 15.10 MB/s completed in 1 minute(s)
Ranging from a measly 15MB to a whopping 1.1GB(!)
Ahhhhhhhhhh shucks....... preston history
goes backward in time now. How do I go forward?
I thought maybe preston head
with the root set to the 2020-05-01 crawl would climb its way to a 2023-05-01 crawl, but no luck:
$ preston head --remote https://linker.bio -r hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b
hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b
I did something horrible:
$ echo -n "hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b" > data/2a/5d/2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
And now:
$ preston head --remote https://linker.bio
[https://linker.bio/hash:...5f8487e0d6a5233e3cd3146] 100.0% of 78 bytes at 0.02 MB/s completed in < 1 minute
[https://linker.bio/hash:...43a8035d8efcfcb403ec547] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...61377ec6a8fdc64dd1ba0d4] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...de2f5916da27000f1efe004] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...900ac41515ad4f2ae52b8b0] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...8666373d21539de8ef7b4b4] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...56e63fa0992ba19f0ca5c85] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...2ac88992b527564d6563064] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...16f6eaa105d246016f32110] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...6befdd6b3929f332ddd4f26] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...22589d0e2318b3ea44ef779] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...59020b2cd1e235c8c05bd41] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...8c010576282d4833ae096be] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...77f0ceaa0dd220c04012fd2] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...3a81bcfabd4abc385d4b8bc] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...d7fb061187253298435f876] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...abc2dd163b4b161149490ed] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...6d9065b0f72c71434b4794e] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...0c7251e4e6884f768835fbe] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...e7e7833a9cdb2fcb664a5bb] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...9e5e84542c872875edf55f0] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...b138ebc478104e3c115d57f] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...f04b1a56f5e2e29b961b88d] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...63fc758414b6b4a2d68ffc9] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...604f212012225bd01ca99d5] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...26b98e4186d22fb6779c24a] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...171b7b12af3a23b9f00a6b6] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...32a8e3ae31d4daa0fdcc0ed] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...3306edad4b9ae23d7742597] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...f47160214ff1dd998d3c66a] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...8ad9a100307c0dad80381bb] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...bad48714d9219a8f0d9d86d] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...f30ad87f93d058e88390579] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...08552f24897548f28ab8001] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...ad7d2a7b108b9f2b1fa541c] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...54a4931de64ab2d2a22da04] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...da2295dd565f3fc8e0a9ced] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...d0e3044e770c36cc85627a2] 100.0% of 78 bytes at ? MB/s completed in < 1 minute
[https://linker.bio/hash:...d639306a912da088c4e1d3f] 432 MB at 20.76 MB/s completed in < 1 minute
hash://sha256/c5989d88250fd6c92f312dd01afa52126b7f02f29d639306a912da088c4e1d3f
Success! hash://sha256/c5989d88250fd6c92f312dd01afa52126b7f02f29d639306a912da088c4e1d3f
is the June 2023 log, which is even better than getting the May log.
$ preston get --remote https://linker.bio 'line:hash://sha256/c5989d88250fd6c92f312dd01afa52126b7f02f29d639306a912da088c4e1d3f!/L25'
<urn:uuid:91772302-544d-4385-a6bd-b2db2bebc6ed> <http://www.w3.org/ns/prov#generatedAtTime> "2023-06-01T03:49:40.361Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <urn:uuid:91772302-544d-4385-a6bd-b2db2bebc6e
@mielliott thanks for sharing the inconsistent behavior re: https://linker.bio . Root cause was the configuration of the linker.bio server, and has been fixed via https://github.com/bio-guoda/preston-service/commit/1ab55c1d1ecef96a56214610fe76e6ef8962a27b . I'll deploy the new configuration momentarily. Please do not that nginx (webserver) has a cache, so the old results may persist for a while. Probably something to think about when swimming upstream using the funky query hashes that we've introduced to make it easy to look into the future.
Curious to hear your thoughts . . .
Thanks for the fix. At least for the funky query hashes, I can use my local index instead when needed.
I feel like preston head -r [anchor]
should swim up the index starting at [anchor]
, but that's not what I'm seeing (https://github.com/bio-guoda/preston/issues/245#issuecomment-1579024404). Is this a bug, or is there a better way to do it, apart from manually editing the index files?
Bug or a feature. Up to you!
The history served by https://linker.bio describes someone named jorrit playing with a bats dataset. I was expecting to find the prov logs of linker.bio's monthly crawls of iDigBio & friends. How do I find the May 2023 crawl log?