chaos / genders

Genders is a static cluster configuration database used for cluster configuration management. It is used by a variety of tools and scripts for management of large clusters.
GNU General Public License v2.0
34 stars 12 forks source link

Querying for UTF-8 attribute values #48

Open noebi opened 3 years ago

noebi commented 3 years ago

Hi, we recently stumbled over the fact that nodeattr does not allow queries involving UTF-8 encoded attribute values. We have actually quite a lot of them but apparently never needed to use them in queries. Also, I haven't seen any documentation that says attributes must be ASCII.

The problem seems to be the lex tokenizer that matches only ASCII characters. While it's not really straightforward to teach flex about UTF-8, a relatively simple patch seems to do most of the work:

--- genders-1.28.1/src/libgenders/genders_query_parse.l 2021-02-28 11:13:08.580111309 +0100
+++ genders-1.28.1.utf8/src/libgenders/genders_query_parse.l    2021-02-28 11:13:45.383330719 +0100
@@ -41,8 +41,19 @@

 %}

+ASC     [a-zA-Z0-9]
+ASCC    [a-zA-Z0-9_\.\=:%\\\/\+]
+
+U       [\x80-\xbf]
+U2      [\xc2-\xdf]
+U3      [\xe0-\xef]
+U4      [\xf0-\xf4]
+
+UT     {ASC}|{U2}{U}|{U3}{U}{U}|{U4}{U}{U}{U}
+UTC    {ASCC}|{U2}{U}|{U3}{U}{U}|{U4}{U}{U}{U}
+
 %%
-[a-zA-Z0-9][a-zA-Z0-9_\.\=:%\\\/\+]*([\-\|&]?[a-zA-Z0-9_\.\=:%\\\/\+]+)* yylval.attr = strdup(yytext); return ATTRTOK;
+{UT}{UTC}*([\-\|&]?{UTC}+)* yylval.attr = strdup(yytext); return ATTRTOK;
 \(                                                                       return LPARENTOK;
 \)                                                                       return RPARENTOK;
 \|\|                                                                     return UNIONTOK;
diff -r -u genders-1.28.1/src/libgenders/Makefile.am genders-1.28.1.utf8/src/libgenders/Makefile.am
--- genders-1.28.1/src/libgenders/Makefile.am   2020-05-15 21:52:08.000000000 +0200
+++ genders-1.28.1.utf8/src/libgenders/Makefile.am      2021-02-28 11:18:00.873911772 +0100
@@ -31,7 +31,7 @@

 # achu: -o option in lex/flex is not portable, use -t and write to stdout
 genders_query_parse.c: genders_query.c $(srcdir)/genders_query_parse.l
-       $(LEX) -t $(srcdir)/genders_query_parse.l > $(srcdir)/genders_query_parse.c
+       $(LEX) -8 -t $(srcdir)/genders_query_parse.l > $(srcdir)/genders_query_parse.c

 # achu: -o option in yacc/bison is not portable, use -b instead
 genders_query.c: $(srcdir)/genders_query.y

Any chance to see something like that in the next releases ?