Open vbwagner opened 9 years ago
Thank you very much for the bug report. I'll take care of this asap.
I've found quite simple solution:
Replace line
string = b''.join(xml_string)
with
if len(xml_string)==1:
string = xml_string[0][:]
else:
string=b''.join(xml_string)
slicing with [:] converts mmap to string and doesn't harm string.
that's a hack -- it'll work for twisted but as per the contract, the ctx.in_string
is just a sequence -- not necessarily one whose length can be known in advance
not no mention it's reading the whole file to memory (quite possibly blocking the reactor while doing that) which completely defeats the purpose of putting the request data in a temp. file in the first place
Really, I see no harm in reading entire request in memory. Request is going to be parsed by lxml into DOM anyway and libxml2 DOM typically consumes order of magnitude more memory than serialized XML. Now spyne supports only requests which are completely in memory. It just breaks on requests which underlying transport haven't placed into memory, and don't even return some meaningful fault such as Client.RequestSizeToBig to the caller.
May be _parse_xml_string is not a good place, to read the request into string. As name suggest, it should get string (not an arbitrary sequence). Really, I don't think that "just a sequence" is a good protocol at this point. Later lxml is used to parse request. And lxml doens't have method to parse 'arbitrary sequence'. It has function which can parse only string (and it is used now) and function which can parse file-like object (parseid). which can cope with mmap objects, but not with lists of them. There is fromstringlist function, which can parse sequence of string, but it is not able to deal with mmaps as well as fromstring.
I don't think that there is a good way to do non-blocking parsing here. Request is already completely read from the network and even mmaped into process address space. There is no natural point where to give control to event loop during parsing. Of course one can feed xml to lxml in arbitrary-sized chunks, and use reactor.callLater(0,....) to give control to reactor between chunks.
But this is quite twisted-specific. And as you've reminded, spyne.protocol.soap.soap11 is transport-agnostic.
Iv'e discovered that attempt to send SOAP request about 300Kb in size caused following error: exceptions.TypeError: sequence item 0: expected string, mmap.mmap found
(spyne 2.11.0, twisted 13.2.0, python 2.7.3)
Offending code seems to be in the spyne.protocol.soap.soap11._parse_xml_string First line of this function performs b''.join(xml_string) which doesn't work if list xml_string contains mmap object instead of plain string. This behavoir of str.join is not changed in python 2.7.8
spyne.server.twisted.http.TwistedWebResource.handle_rpc -f it receives from twisted request.context with file_no attribute, it sets initial_ctx.in_string to mmap object, rather just a string.
Twisted seems to have hardcoded limit in the twisted.web.http.Request.gotLength - if Content-Length is greater than 100000 bytes, use temporary file (which spyne subsequently mmaps) if less, use StringIO
Stack trace of error follows: