eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.65k stars 127 forks source link

Bug: Joex read only the first 50 messages from an IMAP inbox sub-folder and repeat this 10 times #1397

Open arittner opened 2 years ago

arittner commented 2 years ago

Hello!

I configured an INBOX.docspell folder scan on my posteo.de account. But the import process only the first 50 messages for 10 times. The message import never jumps to the "next" page of 50 messages. The job stops successful after 10 tries, because the main limit is 500 messages.

In my first try, it was a horrible import: I got each message 10 times in my docspell. This means 10 duplicates for each mail message.

After that (in the next jobs) Joex excluded die Mails, because the mails are imported in the past.

An example from my log:

022-02-21T9:55:00: === Start importing mails for user *********
2022-02-21T9:55:00: Settings: {"account":"**********","imapConnection":"**********","folders":["INBOX.Docspell"],"receivedSince":null,"targetFolder":null,"deleteMail":false,"direction":null,"itemFolder":null,"fileFilter":null,"tags":null,"subjectFilter":null,"language":null,"postHandleAll":false,"attachmentsOnly":true}
2022-02-21T9:55:00: Reading mails for user arittner from ***********/INBOX.Docspell
2022-02-21T9:55:00: Processing folder INBOX.Docspell
2022-02-21T9:55:00: Searching next 50 mails in Docspell.
2022-02-21T9:55:06: Found 249 mails in folder. Reading first 50
2022-02-21T9:55:06: Not matching on subjects. No filter given
2022-02-21T9:55:06: Excluding mail 'Scan:**************.pdf' it has been imported in the past.'
2022-02-21T9:55:06: Excluding mail 'Ihre ************* (Adresse: **********)' it has been imported in the past.'
...
2022-02-21T9:55:07: Post handling mail: Scan:***********.pdf - no handling defined!
2022-02-21T9:55:07: Post handling mail: Ihre ************ (Adresse: *********) - no handling defined!
...
2022-02-21T9:55:46: Processing folder INBOX.Docspell
2022-02-21T9:55:46: Searching next 50 mails in Docspell.
2022-02-21T9:55:53: Found 249 mails in folder. Reading first 50
2022-02-21T9:55:53: Not matching on subjects. No filter given
2022-02-21T9:55:53: Excluding mail 'Scan:*************.pdf' it has been imported in the past.'
2022-02-21T9:55:53: Excluding mail 'Ihre *********** (Adresse: **********)' it has been imported in the past.'

... and o on, and so on
...
2022-02-21T10:04:04: Reached server maximum of 500 processed mails. Processed 500 mails.
2022-02-21T10:04:04: Job execution successful

I use Docspell 0.32.0; Docker installation, Postgress, Posteo.de IMAP account on a subfolder.

eikek commented 2 years ago

Hi @arittner ,

thank you very much for all your inputs! I'll go through them in the next days, they all make sense.

Regarding this issue, it could be that you need to take steps in order to move the mails out of the way that have been processed. Did you specify a setting to move or delete mails that have been processed? If not the periodicity and "received since" needs to be adjusted.

arittner commented 2 years ago

Did you specify a setting to move or delete mails that have been processed?

Ah no, better I try it with moving. I was not 100% sure if move means a local action or something in the IMAP folder structure.

If not the periodicity and "received since" needs to be adjusted.

I try it with "move to folder", maybe it fits more to my workflow. So possibly a Layer-8 problem here. But anyway, it would be nice to check, why Joex imported the messages 10 times. Because I would expect the "duplicate check" should also work immediately in the same job.

eikek commented 2 years ago

But anyway, it would be nice to check, why Joex imported the messages 10 times. Because I would expect the "duplicate check" should also work immediately in the same job.

Yes, this is a bug. It should download in 50-batches but not redo the first batch 10 times of course! I need to check this.