brokkr / poca

A fast, multithreaded and highly customizable command line podcast client, written in Python 3
GNU General Public License v3.0
23 stars 4 forks source link

AttributeErrors when unpickling non-ascii LXML objects #134

Closed brokkr closed 3 years ago

brokkr commented 3 years ago

How to replicate:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/mads/.virtualenvs/poca11/lib/python3.6/site-packages/poca/subupdate.py", line 34, in run
    subdata = self.target(*self.args)
  File "/home/mads/.virtualenvs/poca11/lib/python3.6/site-packages/poca/subupdate.py", line 67, in __init__
    self.sub)
  File "/home/mads/.virtualenvs/poca11/lib/python3.6/site-packages/poca/history.py", line 39, in get_subjar
    jar, outcome = open_jar(db_filename)
  File "/home/mads/.virtualenvs/poca11/lib/python3.6/site-packages/poca/history.py", line 21, in open_jar
    jar = pickle.load(f)
  File "src/lxml/objectify.pyx", line 1808, in lxml.objectify.fromstring
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1784, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 2
lxml.etree.XMLSyntaxError: error parsing attribute name, line 2, column 232
brokkr commented 3 years ago

One way to skirt the issue would be to see if it would be sufficient to create a hash of the entire sub. If the only purpose is to catch changes, that should be sufficient.

brokkr commented 3 years ago

Of course, if unicode characters in element names are banned, it should be caught at the XML parsing stage. But I don't think they are, cf. sanitization efforts.

On the other hand: Are there any good reasons to allow them?

We can't match against a list because vorbis comment tags can be anything (in ascii) - so how would we do this? Loop over a list, try to encode('ascii')? Seems primitive...

brokkr commented 3 years ago

Maybe do some testing, see if it's an LXML or a pickle issue. Maybe the wrong character set is assumed? If it is an upstream bug, we should report it.

brokkr commented 3 years ago

Note that if #139 is implemented, this becomes irrelevant.

brokkr commented 3 years ago

Neither Pickle nor LXML is used in v2. Closing.