Open simahawk opened 13 years ago
Standard way to drop something is in htmlcontentextractor is to create a rule for a dummy field which you'll never use. Since all the rules cut out the content they select it effectively removes that part of the html. More details in the docs https://github.com/djay/transmogrify.htmlcontentextractor/blob/master/transmogrify/htmlcontentextractor/templatefinder.txt
I've never tried it on attributes however. If it doesn't work then there should be a way to make it work :)
Failing that there is also regex find and replace feature in transmogrify.webcrawler... but regex on html is a pain.
I solved by using http://lxml.de/lxmlhtml.html#cleaning-up-html in a custom blueprint in a custom package. I think that probably is worth to include such a blueprint into transmogrify.htmlcontentextractor and make it configurable by these paramaters http://lxml.de/api/lxml.html.clean.Cleaner-class.html. What do you think?
Hi, I need to drop a lot of hard-coded "style" attributes in my html source: is there a parameter that takes an xpath or whatever and drops specific attributes before the import?
Thanks