Closed est closed 8 years ago
workaround for lxml
in case anyone needed
' '.join(x.text for x in elem.xpath('.//*[not(self::script or self::style)]') if x.text)
Why should it exclude scripts? What if I want script tag content? I guess it's better to remove scripts before getting text() in your case:
pq('script').remove()
pq.text()
@neumond well your reply made me speechless. Hope you enjoy the scripts and style declarations in your text.
Thanks for the solution though, works well.
You probably don't know that jquery's text() method acquires scripts and styles along with normal text.
As far as I know jquery does exclude scripts, but in another case, when you assign innerHTML. It is intended to guarantee some degree of safety. Reexecuting scripts by assigning innerHTML causes hard-to-catch bugs.
I agree that be able to retrieve scripts can be usefull. for example to extract some json included in the html (who use apis?)
Is this by design? If so, is there a way to get rid of
<script>
tags duringtext()
?