Open ScottPeterJohnson opened 2 years ago
Comment by Scott on July 17, 2021: It looks like there isn't at the moment, which does somewhat limit the usefulness of SpamAssassins's bayes classifying. At one point all mail added by a user to the inbox was learned as ham, but that tended to bloat the spam DB and cause issues with large imports.
At some point I plan to replace SpamAssassins's bayes classifier with custom learned neural nets, but until then I can add a page for viewing SA status and manually adding mails.
(Sorry for the late response, on holiday of sorts)
Comment by stephan on July 17, 2021: (Sorry, for closing the issue. I didn't mean to. I opened the issue to leave a comment a few hours ago, then left. When I returned to that tab, a full quote reply was submitted, which apparently also closed the issue. I was able to delete the full quote reply.)
> [...] but until then I can add a page for viewing SA status and manually adding mails.
I think, that would be great.
Just a thought: I'm not sure, how many users have (and use) an "Archive" folder. I would assume most, as it's an option in the Roundcube webmail. Maybe - to keep it simple - one could just use those mails (or a sample thereof) as "good" to train SpamAssassin.
In any case, thank you very much and please enjoy your holiday, Scott :-)
Comment by rnkn on July 26, 2021:
As an aside @Scott I know a while back you had an instance failure due to SpamAssassin eating up memory; I happened to have read about someone switching to rspamd
recently and so thought I might share: https://dataswamp.org/~solene/2021-07-13-smtpd-rspamd.html
Comment by seth on February 10, 2022: @Scott - I have 4 years' worth of known spam (about 20k messages) that you can have if you want to train a classifier. Just let me know.
(This issue was imported from Gitea) stephan on July 15, 2021: Hi, I’m new to Purelymail - and overall very happy with the service. Thanks, Scott :-)
I have a question/feature suggestion regarding the training of the spam filter.
Every day, a few spam mails get through to my Inbox, which I then mark as spam (move to the "Junk" folder). That should train SpamAssassin for "spam".
Per the spam filtering FAQ, for the per-user database to kick in, I need to train SpamAssasin with 200 non-spam messages as well. Since no non-spam messages ever get to the Junk folder (no false-positives, good) I don’t have any “good” messages to move out of the "Junk" folder for training SpamAssasin.
So, my question is this: Is there another way to "show" "good" messages to SpamAssasin (e.g. all the messages in my "Archive" folder) for training?