A test suite of archiving cases would be useful. The idea would be to collect a set of examples of websites to crawl, with different features and levels of complexity, to test crawler/archiving software tools. The cases would range from easy to hard. Test suites such as this are well-known, and employed in other efforts to demonstrate software compliance. One can also build a lot of tooling around test cases, including drivers and even controlled vocabularies to describe the different features being tested by different cases. (C.f. this test suite in an unrelated domain.)
Test suites for archivers is something that other groups have done to some extent, so an important question to address is how this effort would be situated in the broader space and how would it interact with other people's efforts.
A test suite of archiving cases would be useful. The idea would be to collect a set of examples of websites to crawl, with different features and levels of complexity, to test crawler/archiving software tools. The cases would range from easy to hard. Test suites such as this are well-known, and employed in other efforts to demonstrate software compliance. One can also build a lot of tooling around test cases, including drivers and even controlled vocabularies to describe the different features being tested by different cases. (C.f. this test suite in an unrelated domain.)
Test suites for archivers is something that other groups have done to some extent, so an important question to address is how this effort would be situated in the broader space and how would it interact with other people's efforts.