Misc Questions - Githubissues

moseleymark commented 7 years ago

Lots of questions (but I didn't want to explode your "Issues" count in github)

Does the usual advice to use 'local' for all locally-scoped variables apply here as usual? Or is there any complication with 'local'? There aren't any 'local' instances in the wforce.conf or wforce.conf.example, so I wondered if there was a reason.
Is there a way to dump a database? Either in lua or via the CLI
Assuming there's a way to dump the db in lua or the cli (and if in lua, presumably I'd write a custom function to dump the database), will that lock anything? I.e. will traversing the db cause twGet/twAdd to block until the traversal is done?
If I'm collecting a number of stats, is it better to use fewer databases (with correspondingly larger # of entries), or more databases (more sharded)? Just wondering about local contention.
Is the RBL lookup a blocking function? That is, should I use them sparingly or be liberal with them? My RBLs will be served local to the box via unbound, so relatively quick.
Is there any way to do non-blocking calls to redis, a la https://github.com/openresty/lua-resty-redis ? I'd love to be able to track whitelists there, but not if calling out to redis is a blocking operation and ties down an entire thread.
I want to be able to set some masks on lt.remote addresses (but not set the whole database to use that mask, though I'm blanking on what that directive is). Is there a function to apply a netmask to a copy of lt.remote, or do I just need to do the usual conversion to Int and bitshift and convert back? The context is that I was to be able to add a stat to the db for both the full lt.remote IP as well as the /24 version of lt.remote

Sorry for so many questions. Thanks!

neilcook commented 7 years ago

On 8 Sep 2017, at 19:03, moseleymark notifications@github.com wrote:

Lots of questions (but I didn't want to explode your "Issues" count in github)

Does the usual advice to use 'local' for all locally-scoped variables apply here as usual? Or is there any complication with 'local'? There aren't any 'local' instances in the wforce.conf or wforce.conf.example, so I wondered if there was a reason.

You don’t have to, but it increases performance. I actually made a change to wforce.conf this week that made all variables local, and I’ll do these with wforce.conf.example at some point.

Is there a way to dump a database? Either in lua or via the CLI

Nope, not currently. I question the reason for doing so. One thing I’m considering is adding an event for expiry of time window data, however the way that is done currently is with a “just-in-time” method, i.e. windows only expire when you look up data for that key, which makes expiry somewhat unpredictable.

Assuming there's a way to dump the db in lua or the cli (and if in lua, presumably I'd write a custom function to dump the database), will that lock anything? I.e. will traversing the db cause twGet/twAdd to block until the traversal is done?

Currently the in-memory DB is locked with a single mutex. I’m considering moving to read/write locks. So if DB traversal was allowed (and I could potentially implement a DB traversal function), then currently it would block other functions. But moving to R/W locks would not have that problem.

If I'm collecting a number of stats, is it better to use fewer databases (with correspondingly larger # of entries), or more databases (more sharded)? Just wondering about local contention.

The DB is not sharded. Every server has a full copy of the DB. It’s always better to have fewer DBs. The only reason to have multiple DBs is if you need different time windows.

Is the RBL lookup a blocking function? That is, should I use them sparingly or be liberal with them? My RBLs will be served local to the box via unbound, so relatively quick.

All functions in Lua which call back into C++ are blocking. Having said that DNS lookups are very quick as you say, and in practice particularly for RBLs which are locally served, they don’t really slow it down at all. Any reason why you’re using unbound rather than rbldnsd for RBLs?

Is there any way to do non-blocking calls to redis, a la https://github.com/openresty/lua-resty-redis https://github.com/openresty/lua-resty-redis ? I'd love to be able to track whitelists there, but not if calling out to redis is a blocking operation and ties down an entire thread.

There’s no equivalent of the openresty coroutine-based code. Given the amount of calls back into C++, I doubt it would be possible to move to such a model. However if your calls are a bit slow, you can just add more threads and Lua states. I already use Redis for the persistent blacklists, and I haven’t seen much of a performance problem. My recommendation for using Redis for whitelists would be to use standard lua-redis module, and just load the whitelist on startup and provide a function to reload which is called from the console.

I want to be able to set some masks on lt.remote addresses (but not set the whole database to use that mask, though I'm blanking on what that directive is).

You mean the twSetv4Prefix and twSetv6Prefix functions>

Is there a function to apply a netmask to a copy of lt.remote, or do I just need to do the usual conversion to Int and bitshift and convert back? The context is that I was to be able to add a stat to the db for both the full lt.remote IP as well as the /24 version of lt.remote

There is a Netmask class which is exposed to Lua, but currently I don’t expose the “tostring” method of that class. If I did, you could use that to set individual prefixes. Feel free to add an enhancement issue.

You could also do it by having two DBs, and only set the /24 prefix on one of the DBs.

Sorry for so many questions. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PowerDNS/weakforced/issues/167, or mute the thread https://github.com/notifications/unsubscribe-auth/AOWv2IzpKPapsMQc_NwFVBVdwnEcJNOkks5sgfHfgaJpZM4PR2Vp.

moseleymark commented 7 years ago

Nope, not currently. I question the reason for doing so. One thing I’m considering is adding an event for expiry of time window data, however the way that is done currently is with a “just-in-time” method, i.e. windows only expire when you look up data for that key, which makes expiry somewhat unpredictable.

Partially for debugging purposes ("are things actually getting updated in there") but also for reporting. It'd be nice (but not a must-have, esp not if it's at the cost of blocking all clients, while doing so) to be able to dump things like "who's the top XX failed IPs" or "who are the top XX failed usernames", etc.

The DB is not sharded. Every server has a full copy of the DB. It’s always better to have fewer DBs. The only reason to have multiple DBs is if you need different time windows.

Cool, good to know. I had been thinking along the lines of a 'failure' db and a 'success' db (to be able to track successful logins with a suspicious amount of diff IPs), but in light of this, I'll just use different prefixes to indicate that

All functions in Lua which call back into C++ are blocking. Having said that DNS lookups are very quick as you say, and in practice particularly for RBLs which are locally served, they don’t really slow it down at all. Any reason why you’re using unbound rather than rbldnsd for RBLs?

I am actually. I've got unbound fronting rbldnsd (to be able to forward things elsewhere and get caching -- though I might not end up needing it and just use rbldnsd)

There’s no equivalent of the openresty coroutine-based code. Given the amount of calls back into C++, I doubt it would be possible to move to such a model. However if your calls are a bit slow, you can just add more threads and Lua states. I already use Redis for the persistent blacklists, and I haven’t seen much of a performance problem. My recommendation for using Redis for whitelists would be to use standard lua-redis module, and just load the whitelist on startup and provide a function to reload which is called from the console.

Good to know. A periodic reload of a whitelist dump might be the best path. If DNS lookups are that cheap, I might try to integrate it into rbldnsd too (i.e. rsync out my own custom zones as whitelists).

You mean the twSetv4Prefix and twSetv6Prefix functions>

Yup, that was it.

You could also do it by having two DBs, and only set the /24 prefix on one of the DBs.

Sounds like the best route.

Thanks again for answering all these questions. weakforced really fills a missing niche. It's something I've always anted to build myself, but every time came to the realization that I'd never be able to make it be remotely performant enough.

PowerDNS / weakforced

Misc Questions #167