ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.31k stars 129 forks source link

Record grab-site version in WARC headers #222

Closed JustAnotherArchivist closed 2 years ago

JustAnotherArchivist commented 2 years ago

Before:

Software: Wpull/3.0.9 Python/3.7.12

After:

Software: grab-site/2.2.3 Wpull/3.0.9 Python/3.7.12

PhantomJS and youtube-dl are still added at the end, although I think both might be broken anyway.

JustAnotherArchivist commented 2 years ago

Note that this only works with ludios_wpull, not upstream wpull, because the plugin setup runs after the WARC recorder setup on the latter. That's https://github.com/ArchiveTeam/wpull/issues/383.