Open b5 opened 7 years ago
We should run heritrix against a set of urls, copy the output, and write a set of tests that check sentry's output against the heretrix output as a form of ground truth
As a note, this should only apply to files deemed relevant. I don't think we're concerned about things like directory structure or configuration files
We should run heritrix against a set of urls, copy the output, and write a set of tests that check sentry's output against the heretrix output as a form of ground truth