ippusbxd returns invalid data to TCP clients

alexpevzner commented 4 years ago

HTTP request, sent to one TCP connection, may eventually be replied to another unrelated connection.

The scenario is following:

Client connects to TCP port and sends HTTP request. It causes an USB interface to be allocated for the client connection
Client disconnects before reply is fully received or even before reply is created. USB interface becomes "free" and returned to the pool of free interfaces
HTTP reply is not discarded by device at this case, but remains buffered on per-USB-interface buffer.
Client establishes another TCP connection, which receives previously USB interface with reply data pending. This data will be sent to this new TCP connection, surprising a client

This problem comes from the fact, that USB is not TCP, and "closing USB connection" doesn't have a semantics of closing TCP connection, where all data is rejected.

Note, even ippusbxd restart doesn't drain buffered USB data

To fix this problem, ippusbxd must always read an entire HTTP reply, even if client already disconnected and even if it takes half an hour. There are 2 methods how HTTP/1.1 server may indicate end of data stream: it either announces stream size in a reply Content-Length header, or uses HTTP chunked encoding; server is free to choose any of these methods on a per-request basics, and ippusbxd MUST understand both. Effectively it means, ippusbxd MUST implement a HTTP reverse proxy rather that just a simple TCP relay.

BTW, #14 most likely a related issue

tillkamppeter commented 4 years ago

@DavieV, any ideas?

tillkamppeter commented 4 years ago

@alexpevzner, I have now tested everything currently possible (IPP print with CUPS, AirScan/eSCL with both SANE backends, web admin interface including the built-in web scan) on my HP OfficeJet Pro 8730 with your ipp-usb daemon and it works all perfectly and reliably. So what we need is to transfer this into ippusbxd.

DavieV commented 4 years ago

This is great to hear! Please let me know if there's anything you would like me to assist with. Once this newer system has been integrated into ippusbxd I will work on integrating it with Chrome OS.

tillkamppeter commented 4 years ago

@DavieV, what you could cooperate with us now, is testing @alexpevzner's ipp-usb daemon, especially whether it works under Chrome OS and also with all the hardware devices you have access to.

DavieV commented 4 years ago

Unfortunately it turns out that Go projects aren't allowed in Chrome OS due to their large binary sizes.

In the meantime, I think we'll maintain a fork of the current ippusbxd and work towards implementing the changes that exist in ipp-usb.

tillkamppeter commented 4 years ago

If you are working on taking over the changes of ipp-usb into ippusbxd you are very welcome with this and I would suggest you to submit them here to make ippusbxd better. If you would do it in rather short term (up to mid-February 2020) I would stay with ippusbxd in Ubuntu and not consider a switchover to ipp-usb, only if this is a long-term project, which you will finish only in a year or longer, I will probably go the ipp-usb way in Ubuntu.

alexpevzner commented 4 years ago

Heh, programming language from Google cannot be used on operating system from Google.

Executable size is large, I agree.

However, ipp-usb itself uses memory very carefully. For example, images are streamed between host and device, ipp-usb doesn't try to load entire image into its memory before forwarding it. And it doesn't have any external dependencies, except Avahi and libusb. So final memory footprint may be comparable with pure-C version.

fletcherw commented 4 years ago

Hi @tillkamppeter, I'm interested in trying to solve this issue. I'm still reading and understanding RFC 8010 and the ipp-over-usb specification, so please let me know if I am misunderstanding anything.

I think, as Alex said, that we have to support HTTP response parsing. If we don't, then even if we attempt to drain the Bulk IN descriptor of the USB connection when we start a new connection with a client, we have no way of knowing when that drain has finished.

My proposed solution is to integrate basic HTTP request parsing into ippusbxd, preferably via an external package. One candidate is https://github.com/nodejs/llhttp package, though I am not certain how we can cleanly add that as a dependency. This package is a streaming parser for HTTP request/responses, so every time we receive data from the printer or client, we just run the parser on that chunk, and that library handles tracking internal state.

As I understand it, ippusbxd allows for multiple HTTP requests to be sent over one connection, so we will need to track the number of requests that have been sent from the client and ensure that we've received that many responses from the server before allowing the usb connection to be reused.

This may not solve the case where ippusbxd crashes, leaving unread data in the usb buffers, so for a best-effort solution there, we could maybe also add logic to submit an initial synchronous read and discard any data before opening a new USB request.

tillkamppeter commented 4 years ago

Generally, this is a good idea, but one important goal of the continuation of ippusbxd development is to get an IPP-over-USB implementation which is pure C, for a wide acceptance by the different operating systems. We currently already have a reliable free software implementation of the IPP-over-USB standard, ipp-usb but this is written in Go and therefore no accepted by Chrome OS. So to get any advancement with a fix we need to keep ippusbxd completely in C and not pulling any non-C dependencies. Ideally one would take a part of the code of an HTTP parser to do the minimum to only find out where an HTTP stream ends.

fletcherw commented 4 years ago

I'm glad the high-level approach makes sense. I see the appeal in having a streaming HTTP request/response parser which is scoped only to our use case of determining when a message is finished, but I'm concerned about all of the corner cases that would be better handled in a public library. I like your idea of stripping down an HTTP parser to just support tracking header length.

If your only concern with an external library is if it is C or not, some other (pure C) candidates are:

https://github.com/nodejs/http-parser (the previous version of my initial suggestion)
https://github.com/h2o/picohttpparser

Also, just to clarify about Go in Chrome OS, there's no blanket ban as far as I'm aware. We just don't want executables with sizes in the megabytes.

alexpevzner commented 4 years ago

@fletcherw, @tillkamppeter,

I've already briefly explained in the ipp-usb's README what should be done here (and what ipp-usb actually does).

In short:

both request and response headers needs to be parsed; request and response body borders needs to be recognized
remove/rebuild HTTP hop-by-hop headers, as HTTP proxy should do
decouple client-side connections from USB connections, so if client keeps its connection open for a long time, it will not block corresponding USB connection for a long time (there are only 2 or 3 USB connections available)
convert bodies of outgoing POST requests to chunked-encoding, so if client drops TCP connection in the middle of sending, request forwarded to USB side still will be correctly formed
construct correct Host: header on outgoing requests. Note, if device returns URLs as response to some request, host part of returned URLs is taken from the Host: header of the request
reject any attempt to perform HTTP protocol upgrade
Set protocol version of outgoing requests to 1.1, as required by IPP-over-USB standard
To avoid large memory usage, request and response bodies needs to be transferred in the sreaming mode rather that fetch-and-forward
if termination signal is received (daemon restart/system shutdown) and there are HTTP requests in progress, wait some reasonable time for completion, to leave device in the consistent state

fletcherw commented 4 years ago

Thanks for the clear outline of what needs to be done @alexpevzner.

It would be really nice to have all of these things implemented in ippusbxd. The fact that a lot of that of will require understanding HTTP more deeply than just request/response length makes me think that using a mature HTTP library here would be a good move. I think I could work piece-by-piece to implement these improvements.

Re point 3, to clarify, once we've received data from a client, we have to open a USB connection and keep it open until we've finished sending the complete request? The improvement is just that we don't block connections while waiting for the initial data from a client, and we release the USB connection in between HTTP requests?

Apart from Chrome OS, are there others who will continue to rely on ippusbxd over ipp-usb? I'm trying to get a sense of how much effort bringing ippusbxd to feature parity with ipp-usb will be, and if the investment will benefit others.

alexpevzner commented 4 years ago

HTTP level of abstraction is request, not connection. This is important to understand.

Once all request headers are received from a client, you put request to the queue, waiting for idle USB connection. Once USB connection is available, you send there request header, then request body, if any, then receive response header and response body, then return USB connection to the pool of idle connections, so it becomes ready for the next request.

Please note, client can drop its connection in any time. If client disconnects before the first byte of its request was sent to USB, there is no need to do anything, just drop pending request from the queue. But once the first byte of the request was sent to USB, you must perform the complete HTTP transaction regardless of what client is doing. If client disconnects in the middle of sending, you should correctly terminate outgoing stream (chunked-encoding allows it), and in any case retrieve the entire response, forwarding it to the client, if clieny is OK, or just dropping received data, if client disconnected.

When choosing HTTP library, please note, you don't need such a high-level things (and actually need to disable them, if they are present), as following HTTP redirects, HTTP authorization, cookies management, cache control and so on. You need just a basic low-level HTTP transport functionality.

Please note also, that Go not only takes care on many HTTP details, but also guarantees a very high level of safety. IPP over USB daemon runs with a root privileges, so it is a potential attack vector.

It is hard for me to guess who will really benefit from reimplementing all this functionality in the ippusbxd. Real memory footprint of ipp-usb is actually slightly less that of ippusbxd, on-disk size is only important for the very low-level devices, like WiFi routers.

fletcherw commented 4 years ago

Thanks for clarifying Alex, I understand now what you meant about not blocking the USB connections.

For an HTTP library, I'm really only looking for something that parses request and response headers.

For Chrome OS, we've mitigated the safety concerns via sandboxing and syscall filtering, though this doesn't help others.

The memory footprint really isn't the big deal here, though low memory usage is certainly useful. I disagree with your assertion about disk size though. Conserving disk space is necessary for Chrome OS devices. I'd actually prefer to use ipp-usb as well, but the executable size is large enough that using it would be extremely difficult.

@tillkamppeter Would love to hear your opinion on either of the parsing libraries I mentioned in my previous comment. Since those libraries don't appear to be available in package managers, should I add them as subrepos, or something else?

alexpevzner commented 4 years ago

@fletcherw, just few words more. IPP over USB daemon has legitimate direct access to the raw USB, which gives it a lot of power. I bet, there are even some devices in existence, where system disk is wired to the USB and can be altered this way.

tillkamppeter commented 4 years ago

For me it seems that https://github.com/h2o/picohttpparser is the better choice. The C source code is smaller, and the README tells that it is state-less and does not allocate memory, also the Makefile is much simpler. So if it provides what is needed for ippusbxd and is acceptable by operating systems, especially Chrome OS, let us go this way.

tillkamppeter commented 4 years ago

Ira McDonald suggested these ones:

https://www.libhttp.org/ - from MIT
https://github.com/markusfisch/libhttp - lightweight?
http://www.gnu.org/software/libmicrohttpd/? - GNU - very small
https://www.freertos.org/https/index.html - Free RTOS
https://github.com/HISONA/https_client - small HTTP Client
https://github.com/reagent/http - simple HTTP Client
https://os.mbed.com/cookbook/HTTP-Client - in ARM Mbed library - high quality
https://en.cppreference.com/w/cpp/links/libs - index of open source C/C++ libraries

But I did not look into them, so do not know whether there is perhaps a more suitable one under them.

alexpevzner commented 4 years ago

Hi @fletcherw,

if you are only concerned about ChromeOS and supporting printer's web console is not very important to you, you may consider a different, much simpler approach.

In this case, there are actually only 2 clients that works with the printer via ippusbxd: eSCL driver and CUPS.

eSCL driver never drops HTTP request in the middle and doesn't use HTTP connection keep-alive, CUPS, I believe, can be easily modified to do so. As ChromeOS (AFAIK) doesn't allow arbitrary programs to access IPP-over-USB device, taking care of these 2 clients should be enough,

tillkamppeter commented 4 years ago

I think there is something which could leas to a solution here. @alexpevzner has issued a the new version 0.99.12 of sane-airscan and one of the changes is:

libsoup/libglib removal

This release doesn't use libsoub and libglib anymore. libsoup HTTP client was replaced with homemade one, HTTP parsing is handled by nodejs http-parser. Sources of the http-parser were directly included into the sane-airscan source tree, to ensure consistent behavior across different linux distros.

So he has removed the use of HTTP libraries in favor of his own HTTP client and using the nodejs http-parser. @DavieV, @fletcherw, @Ordissimo, you could perhaps check whether this code could be also used for ippusbxd to make it working correctly (with same reliability as ipp-usb.

ThierryHFR commented 4 years ago

Hi @tillkamppeter, I agree with @fletcherw, I'm in favour of using shared libraries. The binary is smaller and the maintenance lighter. @alexpevzner analysis shows that my choice to use libcups for request is a mistake. This is easily resolved. The initial problem remains to be solved.

OpenPrinting / ippusbxd

ippusbxd returns invalid data to TCP clients #15