go-shiori / obelisk

Go package and CLI tool for saving web page as single HTML file
MIT License
257 stars 20 forks source link
archive cli go golang hacktoberfest

Obelisk

Go packages and CLI tool for saving web page as single HTML file


Obelisk is a Go package and CLI tool for saving web page as single HTML file, with all of its assets embedded. It's inspired by the great Monolith and intended as improvement for my old WARC package.

Features

As Go package

Run following command inside your Go project :

go get -u -v github.com/go-shiori/obelisk

Next, include Obelisk in your application :

import "github.com/go-shiori/obelisk"

Now you can use Obelisk archival feature for your application. For basic usage you can check the example.

As CLI application

You can download the latest version of Obelisk from release page. To build from source, make sure you use go >= 1.13 then run following commands :

go get -u -v github.com/go-shiori/obelisk/cmd/obelisk

Now you can use it from your terminal :

$ obelisk -h

CLI tool for saving web page as single HTML file

Usage:
  obelisk [url1] [url2] ... [urlN] [flags]

Flags:
  -z, --gzip                          gzip archival result
  -h, --help                          help for obelisk
  -i, --input string                  path to file which contains URLs
      --insecure                      skip X.509 (TLS) certificate verification
  -c, --load-cookies string           path to Netscape cookie file
      --max-concurrent-download int   max concurrent download at a time (default 10)
      --no-css                        disable CSS styling
      --no-embeds                     remove embedded elements (e.g iframe)
      --no-js                         disable JavaScript
      --no-medias                     remove media elements (e.g img, audio)
  -o, --output string                 path to save archival result
  -q, --quiet                         disable logging
      --skip-resource-url-error       skip process resource url error
  -t, --timeout int                   maximum time (in second) before request timeout (default 60)
  -u, --user-agent string             set custom user agent
      --verbose                       more verbose logging

There are some CLI behavior that I think need to be explained more here :

F.A.Q

Why the name is Obelisk ?

It's inspired by Monolith, therefore it's Obelisk.

How does it compare to WARC ?

My WARC package uses bolt database to contain archival result, which make it hard to share and view. I also think my code in WARC is not really easy to understand, so I often confused when I try to add additional feature or refactoring it.

How does it compare to Monolith ?

Why not just contribute to Monolith ?

Attributions

Original logo is created by Freepik in theirs egypt and desert pack, which can be downloaded from www.flaticon.com.

License

Obelisk is distributed using MIT license, which means you can use and modify it however you want. However, if you make an enhancement for it, if possible, please send a pull request. If you like this project, please consider donating to me either via PayPal or Ko-Fi.