collective / collective.taxonomy

Create, edit and use hierarchical taxonomies in Plone!
https://pypi.org/project/collective.taxonomy/
19 stars 24 forks source link

lxml issue when editing taxonomy previously edited in 1.4.2 #38

Open vincentfretin opened 7 years ago

vincentfretin commented 7 years ago

I had a big issue with lxml in version 1.4.4 several weeks ago, before the holidays, but I didn't have time to look at a fix yet, I just reverted to 1.4.2 which doesn't use lxml. I create the issue so maybe someone can look into it. @tomgross ? I'm surprised no one else had the problem. Here is the traceback:

File "/home/zope/webpro/eggs/collective.taxonomy-1.4.4-py2.7.egg/collective/taxonomy/vdex.py", line 149, in buildTree
   for termnode in self.makeSubtree(index, table):
 File "/home/zope/webpro/eggs/collective.taxonomy-1.4.4-py2.7.egg/collective/taxonomy/vdex.py", line 114, in makeSubtree
   langstringnode.text = langstring
 File "src/lxml/lxml.etree.pyx", line 1031, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:53218)
 File "src/lxml/apihelpers.pxi", line 715, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:24413)
 File "src/lxml/apihelpers.pxi", line 703, in lxml.etree._createTextNode (src/lxml/lxml.etree.c:24276)
 File "src/lxml/apihelpers.pxi", line 1443, in lxml.etree._utf8 (src/lxml/lxml.etree.c:31495)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

I have this traceback with taxonomies I edited ttw with version 1.4.2, somehow it stored utf-8 strings and not unicode (it's this issue we need to fix I think). When I upgrade to 1.4.4, I get the traceback.

vincentfretin commented 7 years ago

Ah, actually to see the traceback you need to remove the except ValueError in jsonimpl.py:53 or else you just see an empty taxonomy. This was great when I saw empty taxonomies in prod! :-S

tomgross commented 7 years ago

I don't have any issues with the lxml version but I didn't migrate any taxonomy from the old version.

vincentfretin commented 7 years ago

Ok, I'll figure it out and probably write an upgrade step to fix existing taxonomies.

petschki commented 6 years ago

I fought with unicode control characters (https://en.wikipedia.org/wiki/Unicode_control_characters) once before and was pretty much alone with my problem. see issue here https://github.com/plone/plone.app.widgets/issues/127 ... our problem was, that users pasted text with hidden unicode control characters into tiny and broke the widget with that. I had to quick-patch the widget and wrote a upgradestep which cleaned the raw data of IRichText field ...