Closed lord-alfred closed 5 years ago
For parsing the messages, this library looks good. https://godoc.org/github.com/jhillyerd/enmime
or another one here: https://godoc.org/github.com/emersion/go-message#example-Read
Thanks, I think best way - parse mails in API (and cache if it needed). Not in guerrilla backend, because this will create an unnecessary load (because I get more than 50k emails per night now).
Yes, sounds like a good way to proceed.
50k? That's quite a lot!
Currently, in production on GuerrillaMail, this software is snapping up 150k per hour.. Thefore, the emails that are about to land in active inboxes are parsed, which is a small subset.
50k? That's quite a lot!
I have a personal server for receiving mails, not a public decision like yours. And this is a night load, during the day it is several times less. At night, we receive newsletters from several services where I have registered more than 50 thousand accounts. π As I wrote somewhere in issues, postfix could not cope with the load and fell, which is why I am using this package now. π
Currently, in production on GuerrillaMail, this software is snapping up 150k per hour
Wow! This is a very large volume! π If itβs no secret, what server hardware is currently being used? I am currently using a cloud-based VPS with this parameters: https://i.imgur.com/ZB7KhBQ.png - priced of this VPS ~ $10 per month.
Currently using a bare metal server from OVH with 128GB of RAM. So the cost is much higher than a small VPS.
The initial emails are placed in RAM (using the Redis & MySQL backend) then later decided if they are to be persisted on SSD or not. Majority are not. Keeping it in RAM makes it super fast. Actually, the new_mail table is using the MEMORY engine. Of course, the mail is lost when if power is lost, but that rarely happens, if ever, and a little loss can be tolerated sometimes. If it does need to be rebooted, there is a script that saves and and restores on boot.
On Thu., 15 Aug. 2019, 23:41 Lord Alfred, notifications@github.com wrote:
50k? That's quite a lot!
I have a personal server for receiving mails, not a public decision like yours. And this is a night load, during the day it is several times less. At night, we receive newsletters from several services where I have registered more than 50 thousand accounts. π As I wrote somewhere in issues, postfix could not cope with the load and fell, which is why I am using this package now. π
Currently, in production on GuerrillaMail, this software is snapping up 150k per hour
Wow! This is a very large volume! π If itβs no secret, what server hardware is currently being used? I am currently using a cloud-based VPS with this parameters: https://i.imgur.com/ZB7KhBQ.png - priced of this VPS ~ $10 per month.
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flashmob/go-guerrilla/issues/173?email_source=notifications&email_token=AAE6MP3K3ZDQQ66G5NL5JDTQEVTJDA5CNFSM4ILZKBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4L72BY#issuecomment-521665799, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE6MPY23UQS2R3SJH6KEZDQEVTJDANCNFSM4ILZKBSA .
Awesome hardware! π
I considered using Redis to store letters, but I abandoned this idea because all emails would then be stored in memory. So I only use MySQL (InnoDB engine) for now, but I plan to upgrade to PostgreSQL in the future. Earlier, I often received dead mysql process from OOM-Killer. This is due to the fact that I constantly write messages to the database and delete them after half an hour, but indexes and other data are not automatically cleared. Because of this, I have to run a script every morning in crontab, which does the messages table optimization. It saves, but itβs not at all an ideal solution and I donβt like it.
In general, MySQL (with InnoDB engine) is not intended for such use (permanent inserting and deletion), I somehow looked for databases for such tasks, but all of them stored data in memory (like Redis), and this requires a lot of expenses on the server - so for now I'm leaning to the fact that PostgreSQL (with PgBouncer) will be the best solution (but, of course, there may be other problems).
Understand. Unfortunately, once you start going to disk, you take a large performance hit, especially on mechanical disks.
Yes, the index rebuild is something to keep in mind. That's why the insert statements are batched and multiple rows are inserted in one query. You could try to experiment by having more rows per batch.
On Fri., 16 Aug. 2019, 00:55 Lord Alfred, notifications@github.com wrote:
Awesome hardware! π
I considered using Redis to store letters, but I abandoned this idea because all emails would then be stored in memory. So I only use MySQL (InnoDB engine) for now, but I plan to upgrade to PostgreSQL in the future. Earlier, I often received dead mysql process from OOM-Killer. This is due to the fact that I constantly write messages to the database and delete them after half an hour, but indexes and other data are not automatically cleared. Because of this, I have to run a script every morning in crontab, which does the messages table optimization. It saves, but itβs not at all an ideal solution and I donβt like it.
In general, MySQL (with InnoDB engine) is not intended for such use (permanent inserting and deletion), I somehow looked for databases for such tasks, but all of them stored data in memory (like Redis), and this requires a lot of expenses on the server - so for now I'm leaning to the fact that PostgreSQL (with PgBouncer) will be the best solution (but, of course, there may be other problems).
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flashmob/go-guerrilla/issues/173?email_source=notifications&email_token=AAE6MP57GET4IPOMRTEKLQDQEV375A5CNFSM4ILZKBSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MGRGY#issuecomment-521693339, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE6MP3Z3VDASSMZAY7VEA3QEV375ANCNFSM4ILZKBSA .
Anybody known easy way to dectect emails in Quoted-Printable encoding and decode this to strings?
I installed guerrilla to production and get all emails in quoted-printable π Previously I use postfix with parsing on php.
PS: Or the best way is to not decode all emails in guerrialla and leave this job for API (decode only requested emails)?