[Clarification] What is the intended envelope.maxHeaderChunk size?

danrubenstein commented 5 years ago

In the envelope method to parse headers, the function comment says:

// It assumes that at most 30kb of email data can be a header

And the associated comment a few lines later seems to suggest that a header can be up to 30kb.

However, the defined constant at the top of the same file is set to 4KB.

My question is for clarification -- I am right to understand that there is some confusion between these two allocations?

If there is, one possible solution is that it could be configurable (via the ParseHeaders processor), although I'd have to think about how that could be implemented without breaking the public contract of e.ParseHeaders.

Thanks!

flashmob commented 5 years ago

Thanks for spotting this! It seems like making it configurable could be some work. Perhaps an alternative would be to make a new function, something like parseHeadersLimit(int) ?

Btw, recently, there has been some work made in to creating a better headers parser (and mime scanner) that works directly on the stream, instead of using the textproto lib. The only reason why there was a limit was because textproto uses another buffer, and wanted to avoid re-buffering... Most of it done, should post that soon...

danrubenstein commented 5 years ago

Oh cool! I didn't know about the multiple buffer thing, that sounds like ultimately the better solution here.

In the interim, I got around this by basically ripping off the Header Parse Processor and envelope's ParseHeaders() method, but I think that it would be great if this were configurable at the package level.

Let me think for a bit about if this can be configurable in the processor without breaking any public contracts. I think maybe with an optional configuration method there on the processor, and an additional public method on mail.Envelope, this would be possible, but I'd have to get into the guts a bit.

Thanks for the reply.

flashmob commented 5 years ago

A suggestion: The Processors have a configuration feature. For a simple example, see in p_debugger.go, where Svc.AddInitializer is used to set a function that extracts two new settings from the config, in combination with the debuggerConfig struct that defines the two new settings that are in the json file.

Next, adjust the parseHeadersLimit function to add the limit int argument, and change HeadersParser processor to use the new function. Also be sure to use a default value if none is present in the config (the config setting should be optional, using the omitempty json hint).

Actually, this is what probably should have been done in the first place...

Btw, what is your suggested default value?

tormath1 commented 5 years ago

Hi there :wave: ,

what's new with parseHeadersLimit(int) ? I would like to be able to tweak the headers size limit. Maybe I could propose an implementation ?

flashmob commented 5 years ago

parseHeadersLimit sets the maximum amount of bytes that will be scanned & buffered to determine if the message contains an email header. The headers will be parsed if a header is detected (first sequence of \n\n). A limit is set just in case someone misbehaves and sends a long message without a header.

with the introduction of the new stream based backend, the parseHeadersLimit func will get deprecated. Instead, the message headers will be scanned while the stream of bytes comes in, in a single pass. Here you can track progress the PR https://github.com/flashmob/go-guerrilla/pull/135

On Sat, 19 Oct 2019 at 03:00, Mathieu Tortuyaux notifications@github.com wrote:

Hi there 👋 ,

what's new with parseHeadersLimit(int) ? I would like to be able to tweak the headers size limit. Maybe I could propose an implementation ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flashmob/go-guerrilla/issues/146?email_source=notifications&email_token=AAE6MP4VAPLT4BZTNML7GPDQPHMRLA5CNFSM4HDYFD72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBU6GZA#issuecomment-543810404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6MP3YGMVX7HHSFJTIGA3QPHMRLANCNFSM4HDYFD7Q .

csotherden commented 4 years ago

While the stream-based headers parser does seem like it will be a huge improvement it might be worth aligning the header buffer to match the specified 30kb for the time being. I had issues with messages from Microsoft having more than 4kb of headers and thus failing to parse. Of course, if the stream-based headers parser is around the corner it doesn't make sense, but the 4kb vs 30kb mismatch mentioned here was first mentioned over a year ago. I can attest that limiting the buffer size to 4kb does cause issues with popular email services.

flashmob / go-guerrilla

[Clarification] What is the intended envelope.maxHeaderChunk size? #146