SoylentNews / slashcode

The slashcode repository for SoylentNews. The initial code base was uploaded as it appeared on Sourceforge as of the last commit in September 2009
http://soylentnews.org
GNU General Public License v2.0
44 stars 22 forks source link

convert entity diacritics to literal ones #375

Closed TheMightyBuzzard closed 9 years ago

TheMightyBuzzard commented 10 years ago

This needs to be done to prevent alternating entities and literals from being chained to bypass our limit on them. Also removed strip_literal from the text input fields on comment previews. There's no point in it as they can edit that to make it say whatever they want anyway. I'll likely be doing the same to journals and subs soon as well.

paulej72 commented 10 years ago

Not sure if I am on board with removing the filters on the comments template. Here is the issue. As of right now the db is dirty, and one must assume it will remain so. Also any data from a form can be dirty as well. Therefore any data must be run through the filters so that we are not allowing bad things to be parsed as raw html. This would include " in any input var, or other such stuff that might break the form (i.e. we should strip off and in html input). I was able to figure out that really bad things happen to the pages when the data is not filtered. There are a lot of places where db values are used and assumed to be clean when they are not.

I think we need to fix the broken filters to stop doing shit that is not needed when we are using utf8. Maybe we should jet remove all entities and put in real utf8 for anything below the 4 byte limit. Excepting " of course and < > as needed and possibly &. You know there is a list of chars that have to be whatched out for when doing html, I should post that here if I find it again.