dspiljar / mboxgrep

A tool to select email messages matching a pattern from a mailbox
https://mboxgrep.org/
GNU General Public License v2.0
2 stars 1 forks source link

/r/n windows text files causes to never find the end of a message #2

Open giachello opened 2 months ago

giachello commented 2 months ago

in mbox.c:302 the code fails with windows text files that have /r/n as endline

Just adding a condition where you test for /n or /r/n fixes the issue.

dspiljar commented 2 months ago

Hi Giovanni,

Although adding a transparent support for \r\n (CRLF) end-of-line sequences seems like a simple solution, the consequences could be non-trivial.

First, it would be a violation of RFC 4155, but this is the lesser concern.

Second, suppose we search mailbox A with Windows-style \r\n end-of-line sequences, and want to append the output to a non-empty mailbox B with Unix-style \n end-of-line sequences (or vice versa). This will corrupt mailbox B.

So I think this question requires a bit more careful consideration. We shouldn't make assumptions only based on the platform here.

May I ask which piece of software created those mailboxes you were testing with?

Best regards, Daniel

giachello commented 2 months ago

Hi there, this is interesting. These mbox files were created by Google Gmail's Takeout process. I checked by unzipping using unzip -b and the files use \r\n at the origin! My main use for mboxgrep is to split Takeout files in Categories.

maybe we can turn this into an option , similar to unzip -a

image

dspiljar commented 2 months ago

Hi Giovanni,

I think your proposal makes sense. We can add an option to tolerate \r\n in the input, but force the correct format in the output.

/Daniel