ciur / papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)
https://papermerge.com
Apache License 2.0
2.47k stars 261 forks source link

Mail import fails with TypeError #302

Closed napcae closed 3 years ago

napcae commented 3 years ago

In case you experience issues with docker image provided by linuxserver.io/papermerge, please open bug report in their repository.

Description I've tried to setup a basic IMAP importer. Running on the supplied docker images, I've set

config/worker.production.py

PAPERMERGE_IMPORT_MAIL_HOST="imap.fastmail.com"
PAPERMERGE_IMPORT_MAIL_USER="*redacted*"
PAPERMERGE_IMPORT_MAIL_PASS="*redacted*" 

(This is another bug maybe, if I set the vars without the PAPERMERGE_ prefix, the app is complaining in app.log that the vars are not set and therefore no IMAP importer will be configured.)

RAW Mail ``` Return-Path: <*redacated*@me.com> Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by sloti35d2t02 (Cyrus 3.5.0-alpha0-141-gf094924a34-fm-20210210.001-gf094924a) with LMTPA; Thu, 11 Feb 2021 11:51:27 -0500 X-Cyrus-Session-Id: sloti35d2t02-1613062287-2734191-2-12295162176795636088 X-Sieve: CMU Sieve 3.0 X-Spam-known-sender: no X-Spam-sender-reputation: 500 (none) X-Spam-score: 0.0 X-Spam-hits: BAYES_05 -0.5, FREEMAIL_FROM 0.001, ME_SENDERREP_NEUTRAL 0.001, RCVD_IN_DNSWL_LOW -0.7, RCVD_IN_MSPIKE_H3 0.001, RCVD_IN_MSPIKE_WL 0.001, SPF_HELO_NONE 0.001, SPF_PASS -0.001, T_FREEMAIL_DOC_PDF 0.01, LANGUAGES unknown, BAYES_USED user, SA_VERSION 3.4.2 X-Spam-source: IP='17.58.63.179', Host='st43p00im-ztfb10063301.me.com', Country='US', FromHeader='com', MailFrom='com' X-Spam-charsets: plain='us-ascii', plain='us-ascii' X-Attached: Scanned Document 3.pdf X-Resolved-to: napcae+papermerge@fastmail.fm X-Delivered-to: papermerge@napcae.fastmail.fm X-Mail-from: *redacated*@me.com Received: from mx4 ([10.202.2.203]) by compute3.internal (LMTPProxy); Thu, 11 Feb 2021 11:51:27 -0500 Received: from mx4.messagingengine.com (localhost [127.0.0.1]) by mailmx.nyi.internal (Postfix) with ESMTP id 2D9BB7C00D7 for ; Thu, 11 Feb 2021 11:51:26 -0500 (EST) Received: from mx4.messagingengine.com (localhost [127.0.0.1]) by mx4.messagingengine.com (Authentication Milter) with ESMTP id 0B62FC20CDD; Thu, 11 Feb 2021 11:51:26 -0500 ARC-Seal: i=1; a=rsa-sha256; cv=none; d=messagingengine.com; s=fm2; t= 1613062286; b=shzwh81wIsi/NX5a7xEt7B5KJO71Kodo0EOvRYLpYNEH4vIyfa +9d4Qe+qOGHs1+/lQV12GgbdOvmxJDX1e8u4khqC5x2MWNF1PH+WPir/bunCsY1v YQVokp9nJk6Ff5MV+/A3tboiD7lpnDjr7kWCiIKqFGCSr/ScDeYEL5KdiWWjZesT RMul3/5U2kJeiZcvegcE7ZBqM+gEONkHZ1Da+ZaQR/x7AG1bcZ84GAmcOyjbQSCy 2hHrAP2KAenk8GnUpMiJ5tzaHkMnAVCT8EGiVXM+SlI2qJCNXAFoyvN05QlcBir3 QUYGiGCCMxi9gS3PhaOuojbmb2j7oFbsjCSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :from:mime-version:date:subject:message-id:to; s=fm2; t= 1613062286; bh=6yBXxzju8JV/zhb925dqcBDq5+fhKuLQL0KRfCRhv1k=; b=Z XVgeZrF4cYQdVkHzXUyiSHESPsMKdqHMWvC5tPB93l3SGxMfOvXgVdhxu9uCFSEa uO+pd+st7Dki1Ch0QrIXGMbVv0uPQvO0a3Hd0s77prebmjKyIMZKjwqNgnxLPC9q fEhsal9P+XL+UMI826vJKgg1xXk2TDEFTvTvRxuWwmF2daLL2nxJLTq7u/l8pU3g XnUCPP2TDyamlnH4rJlnAcP6/m31If6taikxq+rlLSCQrE3knsMoUkpkTU67K/DR xK9Fod1IP7/9xMwc6EiVv3R5HQof//4G9t1i/msNKX9HOSzL9YOH1IdaGQ2k/3K7 dJHdY3SCElPES3cu/9YyQ== ARC-Authentication-Results: i=1; mx4.messagingengine.com; arc=none (no signatures found); bimi=none (No BIMI records found); dkim=pass (2048-bit rsa key sha256) header.d=me.com header.i=@me.com header.b=0oiEPE77 header.a=rsa-sha256 header.s=1a1hai x-bits=2048; dmarc=pass policy.published-domain-policy=quarantine policy.applied-disposition=none policy.evaluated-disposition=none (p=quarantine,d=none,d.eval=none) policy.policy-from=p header.from=me.com; iprev=pass smtp.remote-ip=17.58.63.179 (st43p00im-ztfb10063301.me.com); spf=pass smtp.mailfrom=*redacated*@me.com smtp.helo=st43p00im-ztfb10063301.me.com; x-aligned-from=pass (Address match); x-csa=none; x-ptr=pass smtp.helo=st43p00im-ztfb10063301.me.com policy.ptr=st43p00im-ztfb10063301.me.com; x-return-mx=pass header.domain=me.com policy.is_org=yes (MX Records found: mx01.mail.icloud.com,mx02.mail.icloud.com); x-return-mx=pass smtp.domain=me.com policy.is_org=yes (MX Records found: mx01.mail.icloud.com,mx02.mail.icloud.com); x-tls=pass smtp.version=TLSv1.2 smtp.cipher=ECDHE-RSA-AES256-GCM-SHA384 smtp.bits=256/256; x-vs=clean score=0 state=0 Authentication-Results: mx4.messagingengine.com; arc=none (no signatures found); bimi=none (No BIMI records found); dkim=pass (2048-bit rsa key sha256) header.d=me.com header.i=@me.com header.b=0oiEPE77 header.a=rsa-sha256 header.s=1a1hai x-bits=2048; dmarc=pass policy.published-domain-policy=quarantine policy.applied-disposition=none policy.evaluated-disposition=none (p=quarantine,d=none,d.eval=none) policy.policy-from=p header.from=me.com; iprev=pass smtp.remote-ip=17.58.63.179 (st43p00im-ztfb10063301.me.com); spf=pass smtp.mailfrom=*redacated*@me.com smtp.helo=st43p00im-ztfb10063301.me.com; x-aligned-from=pass (Address match); x-csa=none; x-ptr=pass smtp.helo=st43p00im-ztfb10063301.me.com policy.ptr=st43p00im-ztfb10063301.me.com; x-return-mx=pass header.domain=me.com policy.is_org=yes (MX Records found: mx01.mail.icloud.com,mx02.mail.icloud.com); x-return-mx=pass smtp.domain=me.com policy.is_org=yes (MX Records found: mx01.mail.icloud.com,mx02.mail.icloud.com); x-tls=pass smtp.version=TLSv1.2 smtp.cipher=ECDHE-RSA-AES256-GCM-SHA384 smtp.bits=256/256; x-vs=clean score=0 state=0 X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeduledrheelgdelvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdpuffr tefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecunecujfgurhepgfgthfggff fukffvofesmhejmherhhdtvdenucfhrhhomhepvfhruhhnghcupfhguhihvghnucevhhhi uceotghhihhtrhhunhhgnhhguhihvghnsehmvgdrtghomheqnecuggftrfgrthhtvghrnh epteeiuddvudevudfgueehheevffevlefhffeugffhudetgeeuudelueejgeduhfehnecu kfhppedujedrheekrdeifedrudejledpkeegrdejiedrvdegrddukeelnecuvehluhhsth gvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepudejrdehkedrieefrddujeelpdhh vghlohepshhtgeefphdttdhimhdqiihtfhgsuddttdeifeeftddurdhmvgdrtghomhdpmh grihhlfhhrohhmpeeotghhihhtrhhunhhgnhhguhihvghnsehmvgdrtghomheq X-ME-VSScore: 0 X-ME-VSCategory: clean X-ME-CSA: none Received-SPF: pass (me.com: 17.58.63.179 is authorized to use '*redacated*@me.com' in 'mfrom' identity (mechanism 'ip4:17.58.0.0/16' matched)) receiver=mx4.messagingengine.com; identity=mailfrom; envelope-from="*redacated*@me.com"; helo=st43p00im-ztfb10063301.me.com; client-ip=17.58.63.179 Received: from st43p00im-ztfb10063301.me.com (st43p00im-ztfb10063301.me.com [17.58.63.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx4.messagingengine.com (Postfix) with ESMTPS for ; Thu, 11 Feb 2021 11:51:24 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=1a1hai; t=1613062265; bh=6yBXxzju8JV/zhb925dqcBDq5+fhKuLQL0KRfCRhv1k=; h=Content-Type:From:Mime-Version:Date:Subject:Message-Id:To; b=0oiEPE77pqQgNlV8s5Ga8CBvErcRjulb5MViDkhM5QnQPY59aLUJ/1Cp6HUWOQjWU lWMH2hsIx0L9/4FaUG8apiObUMD885CBNvVv3aYom089ZIf4TvZIjZPhmVZ/O29ZxS I6PMbrT7wvyhQp3ZbtFE7e+vUMouVvRDiBMW5lMcNxCOy62+yLw83lu8UkzrEs36kL Qyhlxh/K6AZVP/fAmLct976h4ok1mcVpbt0mqUEcVUF5thM4wQQJzDDEYGkqaawy7L pFZcEgQEaj5fuB5K+zN9ffjK1NGPHP4KUBrKzsGrtefr03zzSUVyMVglhHtwHUmuDD FQZMv/yc9GBmg== Received: from [192.168.1.139] (unknown [84.76.24.189]) by st43p00im-ztfb10063301.me.com (Postfix) with ESMTPSA id 656D1A4043B for ; Thu, 11 Feb 2021 16:51:01 +0000 (UTC) Content-Transfer-Encoding: 7bit Content-Type: multipart/mixed; boundary=Apple-Mail-96B2EF01-D8EB-4CB6-A478-9F1B1EC8AEBC From: *redacated* <*redacated*@me.com> Mime-Version: 1.0 (1.0) Date: Thu, 11 Feb 2021 16:50:57 +0000 Subject: Scanned Document 3.pdf Message-Id: To: papermerge@napcae.fastmail.fm X-Mailer: iPhone Mail (18C66) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.737 definitions=2021-02-11_07:2021-02-11,2021-02-11 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1011 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=parse_limit adjust=0 reason=mlx scancount=1 engine=8.0.1-2006250000 definitions=main-2102110140 --Apple-Mail-96B2EF01-D8EB-4CB6-A478-9F1B1EC8AEBC Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit --Apple-Mail-96B2EF01-D8EB-4CB6-A478-9F1B1EC8AEBC Content-Type: application/pdf; name="Scanned Document 3.pdf"; x-apple-part-url=A31F4C50-77DC-46BC-B1A4-2377900800BF Content-Disposition: attachment; filename="Scanned Document 3.pdf" Content-Transfer-Encoding: base64 JVBERi0xLjMKJcTl8uXrp/Og0MTGCjMgMCBvYmoKPDwgL0ZpbHRlciAvRmxhdGVEZWNvZGUgL0xl bmd0aCA1NiA+PgpzdHJlYW0KeAErVAhUKFTQD0gtSk4tKClNzFEoygQKGJobGCkYAKGRibEJmJGc q6DvmWuo4JIP1BEIAK+tDlYKZW5kc3RyZWFtCmVuZG9iagoxIDAgb2JqCjw8IC9UeXBlIC9QYW8KXRf [[ truncated ]] 4o293X1Q48V1Pzr0oHGjRq1etoytI5g7OvvWbaxfumLz1rp --Apple-Mail-96B2EF01-D8EB-4CB6-A478-9F1B1EC8AEBC Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sent from my iPhone --Apple-Mail-96B2EF01-D8EB-4CB6-A478-9F1B1EC8AEBC-- ```

Expected An imported document via IMAP.

Actual No documents are imported:

papermerge_worker | [2021-02-11 18:43:53,132: ERROR/ForkPoolWorker-2] Task papermerge.core.management.commands.worker.import_from_email[7b30acf5-5da8-4322-8495-d5f691ec2be4] raised unexpected: TypeError('sequence item 0: expected str instance, bytes found')
papermerge_worker | Traceback (most recent call last):
papermerge_worker |   File "/opt/app/.venv/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
papermerge_worker |     R = retval = fun(*args, **kwargs)
papermerge_worker |   File "/opt/app/.venv/lib/python3.8/site-packages/celery/app/trace.py", line 650, in __protected_call__
papermerge_worker |     return self.run(*args, **kwargs)
papermerge_worker |   File "/opt/app/papermerge/core/management/commands/worker.py", line 44, in import_from_email
papermerge_worker |     import_attachment()
papermerge_worker |   File "/opt/app/papermerge/core/importers/imap.py", line 122, in import_attachment
papermerge_worker |     server.select_folder(settings.PAPERMERGE_IMPORT_MAIL_INBOX)
papermerge_worker |   File "/opt/app/.venv/lib/python3.8/site-packages/imapclient/imapclient.py", line 794, in select_folder
papermerge_worker |     self._command_and_check("select", self._normalise_folder(folder), readonly)
papermerge_worker |   File "/opt/app/.venv/lib/python3.8/site-packages/imapclient/imapclient.py", line 1707, in _command_and_check
papermerge_worker |     typ, data = meth(*args)
papermerge_worker |   File "/usr/lib/python3.8/imaplib.py", line 756, in select
papermerge_worker |     self._dump_ur(self.untagged_responses)
papermerge_worker |   File "/usr/lib/python3.8/imaplib.py", line 1235, in _dump_ur
papermerge_worker |     self._mesg('untagged responses dump:%s%s' % (t, t.join(l)))
papermerge_worker |   File "/usr/lib/python3.8/imaplib.py", line 1234, in <lambda>
papermerge_worker |     l = map(lambda x:'%s: "%s"' % (x[0], x[1][0] and '" "'.join(x[1]) or ''), l)
papermerge_worker | TypeError: sequence item 0: expected str instance, bytes found

Info:

ciur commented 3 years ago

@napcae, I think the problem here is fastmail's name of IMAP inbox folder is "Inbox" (i.e. capitalized) instead of Papermerge's default value of "INBOX" (all uppercase).

Try setting option PAPERMERGE_IMPORT_MAIL_INBOX to:

PAPERMERGE_IMPORT_MAIL_INBOX = "Inbox"

i.e. Capitalized.

It is a valid bug anyway, because (Papermerge) application should provide proper feedback when it cannot select IMAP inbox folder. Thank you for opening this ticket!

napcae commented 3 years ago

Hey @ciur, thanks for taking a look. I don't think that's the problem though, I've started with a (imap) folder called "papermerge", i.e. PAPERMERGE_IMPORT_MAIL_INBOX = "papermerge" and still had the same error.

I've just tried PAPERMERGE_IMPORT_MAIL_INBOX = "Inbox" and the same error comes up:

papermerge_worker | [2021-02-13 10:29:18,251: ERROR/ForkPoolWorker-2] Task papermerge.core.management.commands.worker.import_from_email[b6313ea1-7c2f-4e11-970b-e615379e0b38] raised unexpected: TypeError('sequence item 0: expected str instance, bytes found')

Unless TypeError is a red herring, I'd suspect something different than capitalization be the problem, no?

Edit: Listing imap folders shows that the default name is INBOX as well:

irb(main):012:0> pp imap.list('%', '%')
[#<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren],
  delim="/",
  name="INBOX">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren, :Archive],
  delim="/",
  name="Archive">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren],
  delim="/",
  name="redacted*">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren, :Drafts],
  delim="/",
  name="Drafts">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren, :Junk],
  delim="/",
  name="Junk Mail">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren, :Xnotes],
  delim="/",
  name="Notes">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren],
  delim="/",
  name="redacted*">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren, :Sent],
  delim="/",
  name="Sent Items">,
 #<struct Net::IMAP::MailboxList
  attr=[:Haschildren],
  delim="/",
  name="redacted*">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren, :Trash],
  delim="/",
  name="Trash">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren],
  delim="/",
  name="papermerge">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren],
  delim="/",
  name="*redacted*">,
 #<struct Net::IMAP::MailboxList
  attr=[:Hasnochildren],
  delim="/",
  name="travel">]
napcae commented 3 years ago

So I was digging a little bit more, turns out you need RW imap access. For testing purposes I had created a RO user which leads to the failure.

According to rfc3501, Section 6.4.5:

       The \Seen flag is implicitly set; if this causes the flags to
         change, they SHOULD be included as part of the FETCH responses.

Assuming fastmail implemented this per spec, that is the culprit.

I've verified my hypothesis by setting readonly=true:

        server.select_folder(settings.PAPERMERGE_IMPORT_MAIL_INBOX, readonly=True)

https://github.com/napcae/papermerge/blob/04f09dbb141c5afab8f40dfaf87bfa507e4f9a82/papermerge/core/importers/imap.py#L121-L124

The importer won't crash but will import mails indefinitely since it won't set the read flag on the (imap) server.

How do you want to fix this? I'm not too familiar with python to fix this upstream(in imapclient), but an exception catch might be enough, wdyt @ciur ?

ciur commented 3 years ago

@napcae, I think that wrapping server.select_folder(...) into a try ... except block with logging.error(<error + explanation what may be the cause of it and what user can do to fix it>) will suffice.

Going with readonly=True IMHO defeats the purpose of fetching documents from IMAP account.

If you go on with above fix please don't forget to update documentation. with your findings that IMAP account requires RW access and there will be XYZ warning otherwise.

napcae commented 3 years ago

Going with readonly=True IMHO defeats the purpose of fetching documents from IMAP account.

I'm aware, the explanation was for demonstration purpose only

One more thing, I mentioned it in my first post - the worker won't recognize mail configuration directives for mail, i.e. IMPORT_MAIL_HOST, IMPORT_MAIL_USER etc. without the PAPERMERGE prefix. Is the documentation out of that or what is it that I'm missing here?

ciur commented 3 years ago

@napcae

One more thing, I mentioned it in my first post - the worker won't recognize mail configuration directives for mail, i.e. IMPORT_MAIL_HOST, IMPORT_MAIL_USER etc. without the PAPERMERGE prefix. Is the documentation out of that or what is it that I'm missing here?

Yes. There is difference where you place those settings. Settings can be either in:

  1. papermerge.conf.py file
  2. django settings file (the one referenced by DJANGO_SETTINGS_MODULE)

In papermerge.conf.py file configuration settings are without PAPERMERGE_ prefix, because all (well, 90%) of them are papermerge specific. In django settings file however, there are all sort of settings - for celery (prefixed with CELERY), for allauth (prefixed with ACCOUNT). Respectively settings for specific for papermerge are prefixed as well.

You are right that above information would be great addition to the documentation. In fact I just updated and deployed documentation based on above explanation :)

napcae commented 3 years ago

Alright, thanks for clarification and updating the documentation. I've opened a PR as you can see above this comment. It won't fix the issue but at least it spits out some useful information for the user/administrator.