Not properly buffering newlines

orf commented 8 years ago

I've run into an annoying bug that I'm not sure how to fix. Sometimes the data fed to _handle_responses is like so: ...QBo+puGrxMuda6Cz23F6J7p\r\n\r\n)\r\nP. The stray P at the end of the data is the start of the TAG for the next command, in this case self.tagpre is PHMD.

The problem is that this trailing byte is buffered and sent to _handle_line all by itself, which causes it to be rejected. Then when the next chunk of data from the server is received it is rejected because the tag is found to start with HMD, not PHMD.

Perhaps using StreamReaderProtocol rather than Protocol would be a better fit?

bamthomas commented 8 years ago

Interesting. In your case, the FETCH command before PHMD should have a length that should stop at the end of parenthesis ')'. Then _handle_responses function should be called with PHMD.... BTW I don't know this imap command 'PHMD'. Do you have a RFC for it ?

orf commented 8 years ago

Sorry, I didn't make myself clear. The PHMD prefix is the tag that's sent back from the server with a response, but for some reason the first character is removed. If you have GMail then this function triggers the issue fairly reliably:

async def tally_inbox(host, port, user, password, loop):
    client = aioimaplib.IMAP4_SSL(host, port, loop=loop)
    await client.wait_hello_from_server()

    response = await client.login(user, password)

    res, data = await client.select()
    total_messages = int(data[0])
    print("Selected INBOX: {0} messages".format(total_messages))

    chunks = [(number, number + 10) for number in range(1, total_messages + 1, 10)]
    for start, end in chunks:
        res, data = await client.fetch('{0}:{1}'.format(start, end), '(BODY.PEEK[HEADER])')

This attempts to download all your messages in chunks of 10, but after a while it fails with some invalid tagged responses. I'm not familiar with the IMAP protocol, but if you're right about the length part then perhaps you have an off-by-one issue somewhere in your length calculations?

bamthomas commented 8 years ago

perhaps you have an off-by-one issue somewhere in your length calculations?

Of course that is possible (even if it is unit tested, maybe there is a forgotten use case). I tried your code on my gmail account turning debug logs on.

I fixed the pending_async_command issue cf https://github.com/bamthomas/aioimaplib/commit/f6bf34e6a17a3eaedbe858cceaf4bb5a3d2f64af

When I check mails with a small range, it is ok. But when I check the 10634 messages in my inbox, it seems as if there was a desynchronization between fetch commands.

bamthomas commented 8 years ago

Okay I think I understand the issue : the offset problem comes from the strip() here : https://github.com/bamthomas/aioimaplib/blob/master/aioimaplib/aioimaplib.py#L206

But if we remove it, it is still possible to have half of the next command (it happened when I checked the 10000 mails from gmail). I'm looking after a way to treat this case.

orf commented 8 years ago

Yeah, half-sent commands is a hard problem to solve. I think using a StreamReaderProtocol would help though, as it handles buffering for you so your protocol only receives entire lines. I'm not sure how this would work with the current code though.

bamthomas commented 8 years ago

Thanks for the idea I'm going to see if StreamReaderProtocol could help.

bamthomas commented 8 years ago

@orf tell me if the last commit fixed your issue. It did for me (with my 10634 messages). If it is the case I will release and close this issue.

Actually, I did not used StreamReaderProtocol because :

IMAP protocol is complex and the end of a command is terminated by a tagged line with OK/KO/BAD
IMAP RFC tells that the server could send you data at any-time (so you need to implement a main loop with read/write buffer)
there is two modes of reading data : until the next CRLF or using the litteral length of a FETCH (so you have to implement a minimal state machine to know what to do)
as it is monothreaded, it is not that hard to buffer the previous incomplete line
the change seemed easier keeping Protocol as a base class (but maybe I'm missing something)

orf commented 8 years ago

This fixed it for me! Thank you!

orf commented 8 years ago

There is one more thing, I tried to modify my code to make the fetches in parallel:

futures = [client.fetch('{0}:{1}'.format(start, end), '(BODY.PEEK[HEADER])') for start, end in chunks]
results = await asyncio.gather(*futures)`

This makes the library throw aioimaplib.aioimaplib.Abort: unexpected tagged (CHFE20) response: OK Success for each future, where CHFE20 is the tag (which increments sequentially, CHFE1, CHFE2 etc). Is the fetch command available to run in parallel?

If not it's no big deal, and thanks a bunch for this awesome library :thumbsup:

bamthomas commented 8 years ago

FETCH is one of the IMAP commands that could be run asynchronously. But you cannot run same commands at the same time. As indicated in the README (IMAP commands concurrency) :

if an async command is running, same async commands (or with the same untagged response type) must wait

It is not truly asynchronous execution as the whole is monothreaded and on the same socket. It is asynchronous on IO. I don't see why you would run these fetches in parallel. This would have an interest if the fetch could take a while on the server side.

If you really want to run the fetches in parallel, run your fetch command in different threads/processes but each imap client instance needs to be used in only one thread/process.

BTW :

thx for the bugifx
google extended the IMAP protocol for their needs and don't follow all the IMAP RFC's

EDIT :

that said, there is a lock to synchronize the fetch request so there should not be this unexpected tagged (CHFE20) response. You can open a new issue for that.
to explain the lock, if you run several requests with the same responses type, you cannot identify in the server answers witch request should be fed with the content

bamthomas / aioimaplib

Not properly buffering newlines #12