emailjs / emailjs-imap-client

Low-level JS IMAP client for all your IMAP needs.
MIT License
557 stars 123 forks source link

Add support for sending Blobs (in particular, for APPEND). #22

Open mcav opened 10 years ago

mcav commented 10 years ago

Our tracking issue for this is https://bugzilla.mozilla.org/show_bug.cgi?id=1060564. Blob support is really the only option in terms of supporting large files on low-memory devices.

andris9 commented 10 years ago

Any more info on this? What or how? So far I haven't even considered doing anything with blobs.

asutherland commented 10 years ago

Context

The way we do mail composition with attachments right now is that when the user tells us to attach something, we do chunked reads of the file, manually converting them to a base64-encoded representation, with one output Blob in IndexedDB for each input chunk that we read. In the future, once Gecko has platform support for streaming writes to disk for IndexedDB (coming soon, not yet available), we'll optimize this memory path too. We chose to pre-base64 encode things because we needed to make a copy of the data anyways since we couldn't trust the original code to hang around. Also because for ActiveSync which uses XHR's, Blobs were the only way we could get the platform to stream for us.

Later on, when we go to send the message, we use mailbuilder to format the message, but rather than giving it the true attachment payload, we give it a marker that we're able to locate. We then slice up the built message and insert our actual disk-backed Blobs into the list, then turn that into a super-Blob. This builds the rfc2822 message with the really huge parts still on disk. See https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/master/js/drafts/composer.js#L208

Our TCPSocket wrapper (which exists still because TCPSocket does not work on workers yet, but all our code runs on the worker, so we need to proxy it to the main thread), knows how to deal with Blobs. It does overlapped chunked reads of 96k at a time and uses ondrain notifications to avoid us causing TCPSocket to bloat the parent process's memory usage. See the main thread code at https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/master/js/worker-support/net-main.js#L201 and the worker code that sends it across the wire at https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/master/js/ext/tcp-socket.js#L131

When we were using our forked version of mscdex's imap.js, we just made sure that it could accept a Blob as a payload for APPEND. All that meant is that it needed to call write/send using the Blob without trying to process it.

Proposal

The simplest thing is just to make sure that browserbox can accept a data payload of an opaque Blob and pass it through to a call to write.

There are fancier things that could be done, but I think that comes under the same heading of making more of the APIs support streaming.

Impact of upcoming Standards

Per the 3rd note at the top of http://www.w3.org/2012/sysapps/tcp-udp-sockets/, although TCPSocket originally supported sending Blobs, this has apparently been changed so it will not do this transparently. If we stick with Blobs like this, Gaia email is still going to need to interpose/wrap like we currently do.

It is my understanding that XHR will also support streams (https://whatwg.github.io/streams/).

We can probably eliminate the use of Blobs in the future once the other APIs support streaming. In that case we can stop pre-base64-encoding everything and just do that on the fly. We still will need to duplicate our attachment Blobs, but raw binary form is obviously more compact, and nicely lets the user examine the files they have attached and for us to avoid re-downloading the Blobs for the sent folder (if we are clever enough to match the message up).

andris9 commented 10 years ago

Thanks for the insight, I still need to digest all this. Basically it is a temporary solution, as TCPSocket spec does not include Blobs anymore and in the future we are going to have Streams which help to make memory issues obsolete?

felixhammerl commented 10 years ago

let's see if i understood this correctly...

you want to be able to use blobs, because this way the actual binary data is not being held in memory unless it is really read, which would happen when the blob is written to the socket? this is opposed to the current state of affairs, which requires you to actually hold a duplicate of an arraybuffer in memory that you already have on disk and then transform the whole thing into a binary representation which is then fed into the tcp socket. i assume blobs also make the creation of messages in mailbuild somewhat easier instead of handling a string... does your tcp socket wrapper slice() up the blob to support sending in chunks? this sounds like a good idea in terms of handling backpressure without stuffing it all into memory. is there any particular reason for choosing 96k, or is this just based on empiric evidence?

asutherland commented 10 years ago

Yes, it's really just about memory usage, and yes, I think proper streams support in TCPSocket and XMLHttpRequest will obsolete the need to use Blobs. This functionality was most important for memory-limited devices such as the 128MB ZRAM-backed Tarako device (https://developer.mozilla.org/en-US/Firefox_OS/Developer_phone_guide/Phone_specs#Device_specifications).

Use of Blobs was also important because when we were using the node.js-based libraries and shims, it was fairly common for code to transition back and forth between JS Strings and Node Buffers in a way that would generate additional garbage.

The choice of 96k was made experimentally by me. The rationale is at the top of https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/master/js/worker-support/net-main.js, excerpting here:

/** Important implementation notes that affect us:
*
* - mozTCPSocket generates ondrain notifications when the send buffer is
* completely empty, not when when we go below the target buffer level.
*
* - bufferedAmount in the child process mozTCPSocket implementation only gets
* updated when the parent process relays a messages to the child process.
* When we are performing bulks sends, this means we will only see
* bufferedAmount go down when we receive an 'ondrain' notification and the
* buffer has hit zero. As such, trying to do anything clever involving
* bufferedAmount other than seeing if it's at zero is not going to do
* anything useful.
*
* Leading to our strategy:
*
* - Always have a pre-fetched block of disk I/O to hand to the socket when we
* get a drain event so that disk I/O does not stall our pipeline.
* (Obviously, if the network is faster than our disk, there is very little
* we can do.)
*
* - Pick a page-size so that in the case where the network is extremely fast
* we are able to maintain good throughput even when our IPC overhead
* dominates. We just pick one page-size; we intentionally avoid any
* excessively clever buffering regimes because those could back-fire and
* such effort is better spent on enhancing TCPSocket.
*/

It's fine if you'd prefer that we just monkeypatch this one case to avoid complicating the browserbox codebase needlessly. If you are targeting higher end phone devices, you may want to avoid functionality like this. We need to continue to target devices with only 256MB of RAM where we're also expected to send stuff in the background now, so we do need to try and keep our memory footprint as small as possible.

asutherland commented 10 years ago

Er, and by experimentally, I mean that I adjusted the numbers so that the device was able to send an attachment over my local wi-fi quickly and the buffers did seem to be pre-fetched from Flash fast enough. And most importantly, our memory usage was bounded and the app didn't die, even when I was sending a >100 MB file. (I raised limits on my test server.)

andris9 commented 10 years ago

We build IMAP commands by joining different string arguments into one complete and tagged command string ending with <CR><LF>. Obviously we can not append blobs to a string but I guess we could do the other way around – we could do it in a way that if an argument is not a string but a blob or an arraybuffer, we could fallback to blob mode by converting the already compiled string to a blob and then append any additional arguments to this blob and not to the string, resulting with a command blob which is sent to the tcp or converted to an arraybuffer and then sent to the tcp. This way we could keep the current flow without changing much.