divio / aldryn-search

Haystack 2.0 search index for django CMS
Other
48 stars 77 forks source link

be aware of "hidden linebreaks/spaces" #57

Closed benzkji closed 8 years ago

benzkji commented 8 years ago

Having this rendered by my plugin:

<h1>The Title</h1>What

The word "The" and "TitleWhat" ends in my index. Maybe think about using strip_tags from django?

http://stackoverflow.com/questions/12824899/strip-tags-replace-tags-by-space-rather-than-deleting-them Everybody suggests regexes, but maybe lxml can do it?

czpython commented 8 years ago

@benzkji Thanks. We used to use a version of strip tags from django that took care of this but had to switch to lxml because django's was not able to strip js tags correctly, so it's possible that this is a regression from the switch. Will investigate further.

czpython commented 8 years ago

Fixed by ef4479f3cc532523168e61dca5cac95d9854cf80