When creating derivatives using the Derivative Generator notebook selecting "text content without boilerplate" removes all of the text content from the archives on the sample datasets ("ARCHIVEIT-227-QUARTERLY-XUGECV-20091218231727-00039-crawling06.us.archive.org-8091.warc" and "ARCHIVEIT-227-UOFTORONTO-CANPOLPINT-20060622205612-00009-crawling025.archive.org.arc").
Need to determine if this is due to the boilerplate detection methods on these particular datasets or if it is due to a bug in the AOYTK code.
When creating derivatives using the Derivative Generator notebook selecting "text content without boilerplate" removes all of the text content from the archives on the sample datasets ("ARCHIVEIT-227-QUARTERLY-XUGECV-20091218231727-00039-crawling06.us.archive.org-8091.warc" and "ARCHIVEIT-227-UOFTORONTO-CANPOLPINT-20060622205612-00009-crawling025.archive.org.arc").
Need to determine if this is due to the boilerplate detection methods on these particular datasets or if it is due to a bug in the AOYTK code.