use archive.org, IPFS for commons, esp media

dckc commented 3 years ago

Thanks, archive.org, for solving a big part of my digital-media management issues!

Fix You in memory of Aaron Swartz : K, K, and B : Free Download, Borrow, and Streaming : Internet Archive

offload my commons storage using IPFS, keeping only the collection as precious?

share bookmarks as archive.org collection?

add to what-i-use?

Save To The Wayback Machine - Chrome Web Store
oduwsdl/ipwb: InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
- WARC has been on my radar for a while... what other warc goodies are out there?
- webrecorder/replayweb.page: Serverless Web Archive Replay directly in the browser

dckc commented 3 years ago

for photos, I started playing with...

photoprism/photoprism: Personal Photo Management powered by Go and Google TensorFlow

dckc commented 3 years ago

I moved some some office hours recordings from Dropbox (which was getting low on space) to IPFS:

http://localhost:15001/ipfs/bafybeianwe4vy7sprht5sm3hshvxjeqhwcmvbzq73u55sdhqngmohkjgs4/#/ipfs/QmSnA5kA7eRNpJEP5yZEhrCHnBNxX7MU7E7tmmqEczJDoG

dckc commented 1 year ago

I used this to make an archive of madmode.com for offline use:

wget "http://www.archiveteam.org/" --mirror --warc-file="at"
-- Wget with WARC output - Archiveteam

dckc commented 1 year ago

cap-talk archive from 1998

marc.info has a cap-talk archive going back to the beginning:

List: cap-talk Subject: Welcome to cap-talk From: Jonathan S. Shapiro jsshapiro () earthlink ! net Date: 1998-03-24 17:55:21

It provides messages in original format; for example:

https://marc.info/?l=cap-talk&m=97564025318998&q=mbox

I tried

wget "https://marc.info/?l=cap-talk" -e robots=off --wait=0.25 --mirror --warc-file=cap-talk

but it seemed to be grabbing all lists, not just cap-talk

dckc commented 1 year ago

I'm trying out PeerTube. I applied for an account at Spectra.

dckc commented 11 months ago

archive.org and torrent files

I'm ready to say goodbye to my copy of Ubuntu 5.10 for i386 on CD, after nearly 2 decades of keeping it as a combination keepsake and software supply chain anchor. I donated it to archive.org:

While brainstorming about Merkle trees for file access, I noticed that not only does archive.org OCR the PDF I gave them of the cover and support browsing of the contents of Ubuntu 5.10 i386.iso, but they provide ubuntu-5.10-pc_archive.torrent, which means I can have reliable access to the the full contents of the CDs for just 29k of storage. And brave supports .torrent files natively with WebTorrent (WebTorrent Tutorial looks pretty straightforward.)

But what's in that .torrent file? Aha! bencode from BEP 3! I've heard of it in OCapN discussion but didn't realize it comes from bittorrent. BitTorrent bencode format tools is really handy, including stopping in a debugger to see the details:

BCode.hs from haskell-torrent has a crisp specification:

-- | BCode represents the structure of a bencoded file
data BCode = BInt Integer                       -- ^ An integer
           | BString B.ByteString               -- ^ A string of bytes
           | BArray [BCode]                     -- ^ An array
           | BDict (M.Map B.ByteString BCode)   -- ^ A key, value map
  deriving (Show, Eq)

...

  -- | Return the hash of the info-dict in a torrent file
hashInfoDict :: BCode -> IO Digest
hashInfoDict bc =
    do ih <- case info bc of
                Nothing -> fail "Could not find infoHash"
                Just x  -> return x
       let encoded = encode ih
       digest $ L.fromChunks $ [encoded]

Playing with parse-torrent in a parse-torrent-ubuntu-5.10 project on StackBlitz is handy in that it shows the infoHash, b890d2e1174a809d1cd0437de30400c542e0a939, but its JSON output misled me about the real structure: there is no infoHash key in the file; there's an info dictionary that gets hashed.

Say... Ubuntu offers bittorrent as a download option; maybe they keep a 5.10 .torrent file around? I didn't find one from them, but I did find:

Ubuntu 5.10 (Breezy Badger) : Canonical Ltd., Ubuntu community : Internet Archive
Source torrent:urn:sha1:329a357ebd51db73417e1ad767b041291f598ae8 Addeddate: 2017-06-20 14:16:31 Identifier: Ubuntu-5.10

Note the Source; yes, Internet Archive ingests BitTorrents.

Somehow my Ubuntu 5.10 i386.iso is 632,262 kb, which is 300 kb larger than theirs (631,962 kb). Maybe some unused space captured by gnome-disk-utility when I ripped the CD?

dckc commented 11 months ago

ias3 Internet archive S3-like API also looks handy.

dckc / madmode-blog

use archive.org, IPFS for commons, esp media #108

cap-talk archive from 1998

archive.org and torrent files