collective / transmogrify.webcrawler

transmogrifier source blueprints for crawling html
9 stars 5 forks source link

A few fixes + 1 change to webcrawler #3

Closed davidjb closed 13 years ago

davidjb commented 13 years ago

Essentially, I reformatted the "reformat" function on LXMLPage and provided a logger in the class (as 'log' wasn't available before).

The 1 feature added is the ability to substitute empty strings in the reformat process (for being able to remove regex from pages). Due to the way buildout handles empty lines (they're removed from options), I thought this is a way of handling that.

Change accordingly, though :)

djay commented 13 years ago

Looks good. Code has move to github collective. I'll merge it in there.