transmogrify.webcrawler
will crawl html to extract pages and files as a source for your transmogrifier pipeline.
transmogrify.webcrawler.typerecognitor
aids in setting '_type' based on the crawled mimetype.
transmogrify.webcrawler.cache
helps speed up crawling and reduce memory usage by storing items locally.
These blueprints are designed to work with the funnelweb
pipeline but can be used independently.