Closed orf closed 8 years ago
Interesting. In your case, the FETCH command before PHMD should have a length that should stop at the end of parenthesis ')'. Then _handle_responses function should be called with PHMD...
. BTW I don't know this imap command 'PHMD'. Do you have a RFC for it ?
Sorry, I didn't make myself clear. The PHMD
prefix is the tag that's sent back from the server with a response, but for some reason the first character is removed. If you have GMail then this function triggers the issue fairly reliably:
async def tally_inbox(host, port, user, password, loop):
client = aioimaplib.IMAP4_SSL(host, port, loop=loop)
await client.wait_hello_from_server()
response = await client.login(user, password)
res, data = await client.select()
total_messages = int(data[0])
print("Selected INBOX: {0} messages".format(total_messages))
chunks = [(number, number + 10) for number in range(1, total_messages + 1, 10)]
for start, end in chunks:
res, data = await client.fetch('{0}:{1}'.format(start, end), '(BODY.PEEK[HEADER])')
This attempts to download all your messages in chunks of 10, but after a while it fails with some invalid tagged responses. I'm not familiar with the IMAP protocol, but if you're right about the length part then perhaps you have an off-by-one issue somewhere in your length calculations?
perhaps you have an off-by-one issue somewhere in your length calculations?
Of course that is possible (even if it is unit tested, maybe there is a forgotten use case). I tried your code on my gmail account turning debug logs on.
I fixed the pending_async_command issue cf https://github.com/bamthomas/aioimaplib/commit/f6bf34e6a17a3eaedbe858cceaf4bb5a3d2f64af
When I check mails with a small range, it is ok. But when I check the 10634 messages in my inbox, it seems as if there was a desynchronization between fetch commands.
Okay I think I understand the issue : the offset problem comes from the strip()
here : https://github.com/bamthomas/aioimaplib/blob/master/aioimaplib/aioimaplib.py#L206
But if we remove it, it is still possible to have half of the next command (it happened when I checked the 10000 mails from gmail). I'm looking after a way to treat this case.
Yeah, half-sent commands is a hard problem to solve. I think using a StreamReaderProtocol
would help though, as it handles buffering for you so your protocol only receives entire lines. I'm not sure how this would work with the current code though.
Thanks for the idea I'm going to see if StreamReaderProtocol
could help.
@orf tell me if the last commit fixed your issue. It did for me (with my 10634 messages). If it is the case I will release and close this issue.
Actually, I did not used StreamReaderProtocol
because :
This fixed it for me! Thank you!
There is one more thing, I tried to modify my code to make the fetches in parallel:
futures = [client.fetch('{0}:{1}'.format(start, end), '(BODY.PEEK[HEADER])') for start, end in chunks]
results = await asyncio.gather(*futures)`
This makes the library throw aioimaplib.aioimaplib.Abort: unexpected tagged (CHFE20) response: OK Success
for each future, where CHFE20 is the tag (which increments sequentially, CHFE1, CHFE2 etc). Is the fetch
command available to run in parallel?
If not it's no big deal, and thanks a bunch for this awesome library :thumbsup:
FETCH is one of the IMAP commands that could be run asynchronously. But you cannot run same commands at the same time. As indicated in the README (IMAP commands concurrency) :
if an async command is running, same async commands (or with the same untagged response type) must wait
It is not truly asynchronous execution as the whole is monothreaded and on the same socket. It is asynchronous on IO. I don't see why you would run these fetches in parallel. This would have an interest if the fetch could take a while on the server side.
If you really want to run the fetches in parallel, run your fetch command in different threads/processes but each imap client instance needs to be used in only one thread/process.
BTW :
EDIT :
unexpected tagged (CHFE20) response
. You can open a new issue for that.
I've run into an annoying bug that I'm not sure how to fix. Sometimes the data fed to
_handle_responses
is like so:...QBo+puGrxMuda6Cz23F6J7p\r\n\r\n)\r\nP
. The strayP
at the end of the data is the start of the TAG for the next command, in this caseself.tagpre
isPHMD
.The problem is that this trailing byte is buffered and sent to
_handle_line
all by itself, which causes it to be rejected. Then when the next chunk of data from the server is received it is rejected because the tag is found to start withHMD
, notPHMD
.Perhaps using
StreamReaderProtocol
rather thanProtocol
would be a better fit?