Open iruzanov opened 5 years ago
Hi! Have you tried using the python module? That means you could write python for this and regex should be simple in there and also replacing answers with other ones from the python module? That could be an easy method to get what you wanted? Best regards, Wouter
Yes i have. I tried to use python module and the script used at Unbound start is: from ctypes import * cregexp = cdll.LoadLibrary("/srvs/i.ruzanov/install/unbound-1.9.2/pythonmod/fastregexp.so")
class pcre_extra(Structure): fields = [ ("flags", c_long), ("data", c_void_p), ("callout", c_void_p), ("tables", c_char_p), ("match_limit_recursion", c_ulong) ] pcre_extra_p = POINTER(pcre_extra) pcre_p = c_void_p
class my_regex(Structure): fields = [ ("my_reCompiled", pcre_p), ("my_pcreExtra", pcre_extra_p), ("my_jit_stack", POINTER(c_void_p)), ("next", POINTER(c_void_p)) ] my_regex_p = POINTER(my_regex)
pcre_compile = cregexp.compile_fast_regexp pcre_compile.restype = my_regex_p pcre_compile.argstype = [] re = pcre_compile()
pcre_study = cregexp.study_fast_regexp pcre_study.restype = my_regex_p pcre_study.argstype = my_regex_p s = pcre_study(re)
pcre_exec = cregexp.do_fast_regexp pcre_exec.restype = c_int pcre_exec.argstype = [my_regex_p, c_char_p]
def init(id, cfg): log_info("my-pythonmod: init called, module id is %d port: %d script: %s" % (id, cfg.port, cfg.python_script)) return True
def deinit(id): log_info("my-pythonmod: deinit called, module id is %d" % id) return True
def inform_super(id, qstate, superqstate, qdata): return True
def operate(id, event, qstate, qdata): log_info("my-pythonmod: operate called, id: %d, event:%s" % (id, strmodulevent(event)))
if qstate.return_msg:
if pcre_exec(s, qstate.qinfo.qname_str) == 2:
invalidateQueryInCache(qstate, qstate.return_msg.qinfo)
log_info("my-pythonmod: ok, i've done: %s is filtered" % qstate.qinfo.qname_str)
qstate.return_rcode = RCODE_NXDOMAIN
qstate.ext_state[id] = MODULE_ERROR
return False
#log_info("my-pythonmod: done with %s" % qstate.qinfo.qname_str)
if event == MODULE_EVENT_NEW:
qstate.ext_state[id] = MODULE_WAIT_MODULE
return True
if event == MODULE_EVENT_MODDONE:
log_info("my-pythonmod: previous module done")
qstate.ext_state[id] = MODULE_FINISHED
return True
if event == MODULE_EVENT_PASS:
log_info("my-pythonmod: event_pass")
qstate.ext_state[id] = MODULE_WAIT_MODULE
return True
return True
log_info("my-pythonmod: script loaded.")
I have compiles fastregexp.so library with the same calls just like in fastregexp.c and loaded the lib via python ctypes. But perfomance was only 15000 rps. Its very small for me.
Is that performance because of log_info, or because of python? Perhaps if you comment out the log_info from the operate() function, it would be a lot faster. Logging could be too slow, for eg. 100k qps. Also, unbound caches the responses, from python, if you make them in the operate callback and then there is the normal unbound response speed for them. But nice to hear you tried it and the pythond module worked for that!
Yes, Wouter, i commented even all of log_info() calls in operate function within "if qstate.return_msg:" block. But it did not add the perfomance. I even commented condition check "if pcre_exec() == 1:" to just call invalidateQueryInCache() in any case - it gave me the same 15000 responses per second. PS in my tests all of test queires - the queries each of which matches to some pattern from regexp set
Anyway, yes, python module works fine. And at least i'm planning to use the module for my purposes related to get parameters from SQL-frontend and load the parameters to redis cachedb ;) Also i would like to modify unbound.conf config using python module.
And still two things about compiling Unbound with libpcre:
if(cfg->verbosity < 0)
fatal_exit("verbosity value < 0");
if(cfg->num_threads <= 0 || cfg->num_threads > 10000)
Hello, Wouter!
I remeber about my patch related to calc_hash() function, but now i concerned on regexps in the best and fastest in the world resolver ;) So, what i need to:
This feature might be resolvable with Python (using python module in Unbound) but the perfomance in this case is too poor (15000 replies per second). And the bottleneck is invalidateQueryInCache() system call from Python script.
And what i have done at the moment:
include
struct my_regex { pcre my_reCompiled; pcre_extra my_pcreExtra; pcre_jit_stack my_jit_stack; struct my_regex next; };
void cleanup_fast_regexp(struct my_regex my_regex); int do_fast_regexp(struct my_regex my_regex, char testString); struct my_regex study_fast_regexp(struct my_regex my_regex); struct my_regex compile_fast_regexp(struct my_regex my_regex, char aRegexStrV[], int num_aRegexStrV);
include "config.h"
include "util/log.h"
include "fastregexp/fastregexp.h"
include
include
include
include
void cleanup_fast_regexp(struct my_regex my_regex) { struct my_regex my_regex_next;
}
int do_fast_regexp(struct my_regex my_regex, char testString) { int subStrVec[30];
while(my_regex != NULL) { int pcreExecRet = pcre_jit_exec(my_regex->my_reCompiled, my_regex->my_pcreExtra, testString, strlen(testString), 0, 0, subStrVec, 30, my_regex->my_jit_stack);
} / end of while /
return 0; } struct my_regex study_fast_regexp(struct my_regex my_regex) { pcre_extra pcreExtra; const char pcreErrorStr; struct my_regex my_regex_start = my_regex; pcre_jit_stack jit_stack;
while(my_regex != NULL) { pcreExtra = pcre_study(my_regex->my_reCompiled, PCRE_STUDY_JIT_COMPILE, &pcreErrorStr); / pcre_study() returns NULL for both errors and when it can not optimize the regex. The last argument is how one checks for errors (it is NULL if everything works, and points to an error string otherwise. / if(pcreErrorStr != NULL) { log_err("fastregexp: JIT optimization error: %s. Cleaning up all regex structures", pcreErrorStr); cleanup_fast_regexp(my_regex_start); return NULL; }
} / end of while /
return my_regex_start; } struct my_regex compile_fast_regexp(struct my_regex my_regex, char aRegexStrV[], int num_aRegexStrV) { pcre reCompiled; const char *pcreErrorStr; int pcreErrorOffset; char **aStrRegex;
struct my_regex my_regex_prev = NULL; struct my_regex my_regex_start = NULL;
for(int i=0; i<num_aRegexStrV; i++) { //log_err("the regex is: %s", aRegexStrV[i]); if((my_regex = (struct my_regex*) malloc(sizeof(struct my_regex))) == NULL) { log_err("fastregexp: general memory allocation error"); return NULL; }
} / end of for /
return my_regex_start;
//pcre_free_substring(psubStrMatchStr); pcre_free(reCompiled);
// Free up the EXTRA PCRE value (may be NULL at this point) // if(pcreExtra != NULL) { //#ifdef PCRE_CONFIG_JIT // pcre_free_study(pcreExtra); //#else // pcre_free(pcreExtra); //#endif // } }
Next, i patched your following source files: --- unbound-1.9.2.orig/util/module.h 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/util/module.h 2019-09-16 11:54:20.302813000 +0300 @@ -156,6 +156,8 @@
include "util/storage/lruhash.h"
include "util/data/msgreply.h"
include "util/data/msgparse.h"
+//igorr +#include "fastregexp/fastregexp.h" struct sldns_buffer; struct alloc_cache; struct rrset_cache; @@ -512,6 +514,10 @@
+
struct my_regex *my_fast_regexp; };
/**
--- unbound-1.9.2.orig/daemon/worker.c 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/daemon/worker.c 2019-09-17 13:00:20.176700000 +0300 @@ -1892,6 +1892,11 @@ worker->env.cfg->stat_interval); worker_restart_timer(worker); } +
}
@@ -1933,6 +1938,8 @@ alloc_clear(&worker->alloc); regional_destroy(worker->env.scratch); regional_destroy(worker->scratchpad);
--- unbound-1.9.2.orig/iterator/iterator.c 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/iterator/iterator.c 2019-09-16 12:34:32.062665000 +0300 @@ -160,6 +160,7 @@ outbound_list_init(&iq->outlist); iq->minimise_count = 0; iq->minimise_timeout_count = 0; + if (qstate->env->cfg->qname_minimisation) iq->minimisation_state = INIT_MINIMISE_STATE; else @@ -2576,6 +2577,23 @@ enum response_type type; iq->num_current_queries--;
--- unbound-1.9.2.orig/util/config_file.h 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/util/config_file.h 2019-09-16 13:07:10.312655000 +0300 @@ -575,6 +575,10 @@ int redis_timeout;
endif
endif
char **regexstrv; };
/* from cfg username, after daemonize setup performed /
--- unbound-1.9.2.orig/util/config_file.c 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/util/config_file.c 2019-09-16 17:28:00.678244000 +0300 @@ -327,6 +327,9 @@ cfg->cachedb_backend = NULL; cfg->cachedb_secret = NULL;
endif
endif
endif
--- unbound-1.9.2.orig/util/configparser.y 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/util/configparser.y 2019-09-16 17:27:35.678485000 +0300 @@ -158,6 +158,7 @@ %token VAR_IPSECMOD_MAX_TTL VAR_IPSECMOD_WHITELIST VAR_IPSECMOD_STRICT %token VAR_CACHEDB VAR_CACHEDB_BACKEND VAR_CACHEDB_SECRETSEED %token VAR_CACHEDB_REDISHOST VAR_CACHEDB_REDISPORT VAR_CACHEDB_REDISTIMEOUT +%token VAR_REGEXP VAR_REGEXP_PATTERN %token VAR_UDP_UPSTREAM_WITHOUT_DOWNSTREAM VAR_FOR_UPSTREAM %token VAR_AUTH_ZONE VAR_ZONEFILE VAR_MASTER VAR_URL VAR_FOR_DOWNSTREAM %token VAR_FALLBACK_ENABLED VAR_TLS_ADDITIONAL_PORT VAR_LOW_RTT VAR_LOW_RTT_PERMIL @@ -174,7 +175,7 @@ forwardstart contents_forward | pythonstart contents_py | rcstart contents_rc | dtstart contents_dt | viewstart contents_view | dnscstart contents_dnsc | cachedbstart contents_cachedb |
authstart contents_auth | regexpstart contents_regexp ;
/ server: declaration / @@ -2959,6 +2960,28 @@ } } ; +regexpstart: VAR_REGEXP
; %%
/ parse helper routines could be here /
--- unbound-1.9.2.orig/util/configlexer.lex 2019-06-17 11:50:16.000000000 +0300 +++ unbound-1.9.2/util/configlexer.lex 2019-09-16 15:04:30.764354000 +0300 @@ -483,6 +483,8 @@ redis-server-host{COLON} { YDVAR(1, VAR_CACHEDB_REDISHOST) } redis-server-port{COLON} { YDVAR(1, VAR_CACHEDB_REDISPORT) } redis-timeout{COLON} { YDVAR(1, VAR_CACHEDB_REDISTIMEOUT) } +regexp{COLON} { YDVAR(0, VAR_REGEXP) } +pattern{COLON} { YDVAR(1, VAR_REGEXP_PATTERN) } udp-upstream-without-downstream{COLON} { YDVAR(1, VAR_UDP_UPSTREAM_WITHOUT_DOWNSTREAM) } tcp-connection-limit{COLON} { YDVAR(2, VAR_TCP_CONNECTION_LIMIT) } <INITIAL,val>{NEWLINE} { LEXOUT(("NL\n")); cfg_parser->line++; }
--- unbound-1.9.2.orig/Makefile 2019-09-17 13:38:35.414726000 +0300 +++ unbound-1.9.2/Makefile 2019-09-16 12:31:51.334154000 +0300 @@ -59,14 +59,14 @@ PYTHON_CPPFLAGS=-I. -I/usr/local/include/python2.7 CFLAGS=-DSRCDIR=$(srcdir) -g -O2 -D_THREAD_SAFE -pthread LDFLAGS=-L/usr/local/lib -L/usr/local/lib -L/usr/local/lib -LIBS=-lutil -levent -L/usr/local/lib -L/usr/local/lib/python2.7 -L. -lpython2.7 -lcrypto -lhiredis +LIBS=-lutil -levent -L/usr/local/lib -L/usr/local/lib/python2.7 -L. -lpython2.7 -lcrypto -lhiredis -lpcre LIBOBJS= ${LIBOBJDIR}explicit_bzero$U.o ${LIBOBJDIR}reallocarray$U.o
filter out ctime_r from compat obj.
LIBOBJ_WITHOUT_CTIME= explicit_bzero.o reallocarray.o LIBOBJ_WITHOUT_CTIMEARC4= explicit_bzero.o RUNTIME_PATH= -R/usr/local/lib DEPFLAG=-MM -DATE=20190917 +DATE=20190912 LIBTOOL=$(libtool) BUILD=build/ UBSYMS=-export-symbols $(srcdir)/libunbound/ubsyms.def @@ -126,7 +126,8 @@ edns-subnet/edns-subnet.c edns-subnet/subnetmod.c \ edns-subnet/addrtree.c edns-subnet/subnet-whitelist.c \ cachedb/cachedb.c cachedb/redis.c respip/respip.c $(CHECKLOCK_SRC) \ -$(DNSTAP_SRC) $(DNSCRYPT_SRC) $(IPSECMOD_SRC) +$(DNSTAP_SRC) $(DNSCRYPT_SRC) $(IPSECMOD_SRC) \ +fastregexp/fastregexp.c COMMON_OBJ_WITHOUT_NETCALL=dns.lo infra.lo rrset.lo dname.lo msgencode.lo \ as112.lo msgparse.lo msgreply.lo packed_rrset.lo iterator.lo iter_delegpt.lo \ iter_donotq.lo iter_fwd.lo iter_hints.lo iter_priv.lo iter_resptype.lo \ @@ -139,7 +140,7 @@ validator.lo val_kcache.lo val_kentry.lo val_neg.lo val_nsec3.lo val_nsec.lo \ val_secalgo.lo val_sigcrypt.lo val_utils.lo dns64.lo cachedb.lo redis.lo authzone.lo \ $(SUBNET_OBJ) $(PYTHONMOD_OBJ) $(CHECKLOCK_OBJ) $(DNSTAP_OBJ) $(DNSCRYPT_OBJ) \ -$(IPSECMOD_OBJ) respip.lo +$(IPSECMOD_OBJ) respip.lo fastregexp.lo COMMON_OBJ_WITHOUT_UB_EVENT=$(COMMON_OBJ_WITHOUT_NETCALL) netevent.lo listen_dnsport.lo \ outside_network.lo COMMON_OBJ=$(COMMON_OBJ_WITHOUT_UB_EVENT) ub_event.lo @@ -692,7 +693,7 @@ $(srcdir)/services/modstack.h $(srcdir)/util/net_help.h $(srcdir)/util/regional.h $(srcdir)/util/data/dname.h \ $(srcdir)/util/data/msgencode.h $(srcdir)/util/fptr_wlist.h $(srcdir)/util/tube.h $(srcdir)/util/config_file.h \ $(srcdir)/util/random.h $(srcdir)/sldns/wire2str.h $(srcdir)/sldns/str2wire.h $(srcdir)/sldns/parseutil.h \
Thats all if i didn't forget anything. About Makefile - i know, that is the right way to patch Makefile.in. But now i'm interesting in final result of stabilty and perfomance. And yacc/lex-sources - i tried to add my two options (regexp: and pattern:) using existing declarations of config options. And it was too hard for me ;)
Now what i have:
But i have several issues:
What i would like now - is your authoritative opinion about if all my actions is right or maybe i could (and this is most likely) be wrong in my code. Could you please revise my pathces and tell me what i have to do else
Big thank you in advance!