Open tomsommer opened 6 years ago
Fixed in #1, but it is still a performance problem when $mailaddr is not empty
Did you do some analysis of your case and can share some more details?
db0:keys=9060555,expires=1887615,avg_ttl=0
Any KEYS * commands result in 100% CPU spike with this amount of data
Is the cpu spike in the spamassassin process or in the redis process?
The code tries to implement what the DB backend does:
Seems like there is also a mistake in the pattern. It should contain a '|':
my @keys = $self->{'redis'}->keys($self->{'prefix'}.$mailaddr.'|*');
Does it happen when scanning mail? I wondering where the remove_address is called.
The SQL List also implements this behaviour of deleting all entries of an address when the ip is 'none':
Leaves us with the question if the redis queries could be optimized.
It's the Redis server that goes 100%.
Maybe avoid the wildcard the fire multiple del() instead, hoping for the best?
I also wonder when remove_entry() is called, it happens during scan, but most be some cleanup function?
How many keys are typically returned by KEYS?
Maybe it could be replaced with a SCAN and an interation.
BTW: I just commited 7086453 which fixes the wildcard pattern and released 1.003.
The remove_entry is used by the AutoWhitelist when a record with an ip is updated. Then it'll look for an entry with 'ip=none' and tries to migrate the entry to a new entry with an ip. It removes the entries by removing both with the call to remove_entry() with 'ip=none' and then add_score() with the new migrated score.
When my understanding is correct:
I think it could be fixed in AutoWhitelist by changing it to in case of migration:
This line is a huge performance problem when having millions of keys: https://github.com/benningm/Mail-SpamAssassin-Plugin-RedisAWL/blob/0e42afaffc729a73955276b3aebaa40ae36e2029/lib/Mail/SpamAssassin/RedisAddrList.pm#L142
Can it not be avoided or made optional?