WebCuratorTool / webcurator

The root of the webcurator tool project, containing all modules needed to run a fully functional webcurator tool.
Apache License 2.0
2 stars 1 forks source link

Remove generation of sorted crawl logs #52

Closed hannakoppelaar closed 2 years ago

hannakoppelaar commented 2 years ago

For historical reasons WCT generates sorted crawl logs at index time. This process generates temporary files and takes up time without being particularly useful, so it's best to remove this feature. (As far as we know, nobody is using these files, which can easily be generated on the command line when needed.)

The file strippedcrawl.log should also be removed, since it is only used to generate the sorted crawl.log.