archivesunleashed / aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137 stars 33 forks source link

Add webgraph, imagegraph, webpages, etc. to command line app #431

Closed ruebot closed 4 years ago

ruebot commented 4 years ago

Currently, we have one of the standard auk derivatives as an "app", DomainFrequencyExtractor.

We should also add:

It might also be worth adding, and tweaking where need be if they already exist in the app, the DataFrame derivatives we produced for the NYC and IIPC datathons.

ruebot commented 4 years ago

DomainGraphExtractor is the network graph job without WriteGraph.asGraphml.

ruebot commented 4 years ago

:man_facepalming: PlainTextExtractor is there. Just needs some tweaks.