Closed vext01 closed 4 years ago
If you pipe a link in via stdin it archives just that link, if you pass a URL as an arg it interprets it as a source to import other links from.
See:
https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin
vs
This difference in behavior is intentional but not intuitive, so it's been changed in the upcoming v0.4 archivebox add
CLI design.
Thanks for the offer re: OpenBSD! If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI.
If you pass a URL as an arg it interprets it as a source to import other links from.
I see. That indeed isn't intuitive. The new CLI makes much more sense. Looking forward to that!
So with the current design, if I pass a URL as an arg, it follows links 1 deep. Is that correct?
If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI.
Many thanks. Subscribed.
In a sense it follows one link deep, but that's not really what you want if you're looking for recursive archiving since it doesn't archive the original URL. What it's really doing is treating the path/link argument as a feed to import a list of links from, e.g. a browser history or pinboard export.
Hi,
I've recently discovered archivebox -- what a neat tool!
To try it out, I ran it on my personal website and was surprised to find that it followed links outside of my website too!
So my question is: How many links does it follow before stopping? Can this be controlled in any way?
Thanks!
P.S. I'm an OpenBSD developer. If you can get this up on PyPI, I'll happily make a port so that archivebox can be in the package manager.