benningm / Mail-SpamAssassin-Plugin-RedisAWL

redis support for spamassassin AWL/TxRep
1 stars 1 forks source link

Using keys is a huge performance problem #4

Open tomsommer opened 6 years ago

tomsommer commented 6 years ago

This line is a huge performance problem when having millions of keys: https://github.com/benningm/Mail-SpamAssassin-Plugin-RedisAWL/blob/0e42afaffc729a73955276b3aebaa40ae36e2029/lib/Mail/SpamAssassin/RedisAddrList.pm#L142

Can it not be avoided or made optional?

tomsommer commented 6 years ago

Fixed in #1, but it is still a performance problem when $mailaddr is not empty

benningm commented 6 years ago

Did you do some analysis of your case and can share some more details?

tomsommer commented 6 years ago

db0:keys=9060555,expires=1887615,avg_ttl=0

Any KEYS * commands result in 100% CPU spike with this amount of data

benningm commented 6 years ago

Is the cpu spike in the spamassassin process or in the redis process?

The code tries to implement what the DB backend does:

http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/DBBasedAddrList.pm?view=markup#l162

Seems like there is also a mistake in the pattern. It should contain a '|':

my @keys = $self->{'redis'}->keys($self->{'prefix'}.$mailaddr.'|*'); 

Does it happen when scanning mail? I wondering where the remove_address is called.

benningm commented 6 years ago

The SQL List also implements this behaviour of deleting all entries of an address when the ip is 'none':

http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/SQLBasedAddrList.pm?view=markup#l387

Leaves us with the question if the redis queries could be optimized.

tomsommer commented 6 years ago

It's the Redis server that goes 100%.

Maybe avoid the wildcard the fire multiple del() instead, hoping for the best?

I also wonder when remove_entry() is called, it happens during scan, but most be some cleanup function?

benningm commented 6 years ago

How many keys are typically returned by KEYS?

Maybe it could be replaced with a SCAN and an interation.

benningm commented 5 years ago

BTW: I just commited 7086453 which fixes the wildcard pattern and released 1.003.

The remove_entry is used by the AutoWhitelist when a record with an ip is updated. Then it'll look for an entry with 'ip=none' and tries to migrate the entry to a new entry with an ip. It removes the entries by removing both with the call to remove_entry() with 'ip=none' and then add_score() with the new migrated score.

When my understanding is correct:

I think it could be fixed in AutoWhitelist by changing it to in case of migration: