django-haystack / pysolr

Pysolr — Python Solr client
BSD 3-Clause "New" or "Revised" License
667 stars 340 forks source link

Delete erroring when id's are numeric #489

Open olenchuk opened 4 weeks ago

olenchuk commented 4 weeks ago

PySolr 3.10.0 running under Python 3.11.7, installed on MacOS via homebrew.

Delete was failing when IDs were numeric:

id_list = [1,2,3,4] solr.delete(id = id_list)

This was working in 3.9.0. Breaks in 3.10.0. I played around with this a bit, and noticed 3.10.0 moved from "hand-generating" the XML doc to using xml.etree.ElementTree. Pulled this from 3.10.0 and ran it as a standalone snippet:

from xml.etree import ElementTree
et = ElementTree.Element("delete")
id_list = [1,2,3,4]
for one_doc_id in id_list:
    subelem = ElementTree.SubElement(et, "id")
    subelem.text = one_doc_id
m = ElementTree.tostring(et)

This results in:

 Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1028, in _escape_cdata
    if "&" in text:
       ^^^^^^^^^^^
TypeError: argument of type 'int' is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/a3lrxzz/PYTHON/test.py", line 8, in <module>
    m = ElementTree.tostring(et)
        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1098, in tostring
    ElementTree(element).write(stream, encoding,
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 743, in write
    serialize(write, self._root, qnames, namespaces,
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 906, in _serialize_xml
    _serialize_xml(write, e, qnames, None,
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 904, in _serialize_xml
    write(_escape_cdata(text))
          ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1036, in _escape_cdata
    _raise_serialization_error(text)
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/xml/etree/ElementTree.py", line 1018, in _raise_serialization_error
    raise TypeError(
TypeError: cannot serialize 1 (type int)

If I add the following map statement, to swap int's to str's, it works:

from xml.etree import ElementTree
et = ElementTree.Element("delete")
id_list = [1,2,3,4]
id_list = list(map(str, id_list))
for one_doc_id in id_list:
    subelem = ElementTree.SubElement(et, "id")
    subelem.text = one_doc_id
m = ElementTree.tostring(et)

Printing "m" here gives me:

<delete><id>1</id><id>2</id><id>3</id><id>4</id></delete>
acdha commented 4 weeks ago

If you send a pull-request, that'd be a good addition - I think there's a lot of convention around using strings for document IDs but integers should work.