arskom / spyne

A transport agnostic sync/async RPC library that focuses on exposing services with a well-defined API using popular protocols.
http://spyne.io
Other
1.13k stars 312 forks source link

Big soap request breaks under twisted #407

Open vbwagner opened 9 years ago

vbwagner commented 9 years ago

Iv'e discovered that attempt to send SOAP request about 300Kb in size caused following error: exceptions.TypeError: sequence item 0: expected string, mmap.mmap found

(spyne 2.11.0, twisted 13.2.0, python 2.7.3)

Offending code seems to be in the spyne.protocol.soap.soap11._parse_xml_string First line of this function performs b''.join(xml_string) which doesn't work if list xml_string contains mmap object instead of plain string. This behavoir of str.join is not changed in python 2.7.8

spyne.server.twisted.http.TwistedWebResource.handle_rpc -f it receives from twisted request.context with file_no attribute, it sets initial_ctx.in_string to mmap object, rather just a string.

Twisted seems to have hardcoded limit in the twisted.web.http.Request.gotLength - if Content-Length is greater than 100000 bytes, use temporary file (which spyne subsequently mmaps) if less, use StringIO

Stack trace of error follows:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1438, in dataReceived
    finishCallback(data[contentLength:])
  File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1667, in _finishRequestBody
    self.allContentReceived()
  File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
    req.requestReceived(command, path, version)
  File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
    self.process()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
    self.render(resrc)
  File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
    body = resrc.render(self)
  File "/usr/lib/python2.7/dist-packages/spyne/server/twisted/http.py", line 329, in render
    return self.handle_rpc(request)
  File "/usr/lib/python2.7/dist-packages/spyne/server/twisted/http.py", line 373, in handle_rpc
    contexts = self.http_transport.generate_contexts(initial_ctx)
  File "/usr/lib/python2.7/dist-packages/spyne/server/_base.py", line 61, in generate_contexts
    self.app.in_protocol.create_in_document(ctx, in_string_charset)
  File "/usr/lib/python2.7/dist-packages/spyne/protocol/soap/soap11.py", line 197, in create_in_document
    charset)
  File "/usr/lib/python2.7/dist-packages/spyne/protocol/soap/soap11.py", line 94, in   _parse_xml_string
    string = b''.join(xml_string)
  exceptions.TypeError: sequence item 0: expected string, mmap.mmap found
plq commented 9 years ago

Thank you very much for the bug report. I'll take care of this asap.

vbwagner commented 9 years ago

I've found quite simple solution:

Replace line

string = b''.join(xml_string)

with

if len(xml_string)==1:
   string = xml_string[0][:]
else:
   string=b''.join(xml_string)

slicing with [:] converts mmap to string and doesn't harm string.

plq commented 9 years ago

that's a hack -- it'll work for twisted but as per the contract, the ctx.in_string is just a sequence -- not necessarily one whose length can be known in advance

plq commented 9 years ago

not no mention it's reading the whole file to memory (quite possibly blocking the reactor while doing that) which completely defeats the purpose of putting the request data in a temp. file in the first place

vbwagner commented 9 years ago

Really, I see no harm in reading entire request in memory. Request is going to be parsed by lxml into DOM anyway and libxml2 DOM typically consumes order of magnitude more memory than serialized XML. Now spyne supports only requests which are completely in memory. It just breaks on requests which underlying transport haven't placed into memory, and don't even return some meaningful fault such as Client.RequestSizeToBig to the caller.

May be _parse_xml_string is not a good place, to read the request into string. As name suggest, it should get string (not an arbitrary sequence). Really, I don't think that "just a sequence" is a good protocol at this point. Later lxml is used to parse request. And lxml doens't have method to parse 'arbitrary sequence'. It has function which can parse only string (and it is used now) and function which can parse file-like object (parseid). which can cope with mmap objects, but not with lists of them. There is fromstringlist function, which can parse sequence of string, but it is not able to deal with mmaps as well as fromstring.

I don't think that there is a good way to do non-blocking parsing here. Request is already completely read from the network and even mmaped into process address space. There is no natural point where to give control to event loop during parsing. Of course one can feed xml to lxml in arbitrary-sized chunks, and use reactor.callLater(0,....) to give control to reactor between chunks.

But this is quite twisted-specific. And as you've reminded, spyne.protocol.soap.soap11 is transport-agnostic.