gfredericks / quinedb

QuineDB is a quine that is also a key-value store.
Eclipse Public License 1.0
580 stars 23 forks source link

Use `printf %q` for simplicity and binary safety #9

Open Artoria2e5 opened 8 years ago

Artoria2e5 commented 8 years ago

Minor performance improvement from 11.331s to 10.937s.

Now it's able to hold arbitrary null-terminated bytes. (You may want to write some tests)

for ((i=0; i<512; i++)); do j+=($(head -c 16 /dev/urandom)); done
declare -p j > testdata
e=({a,A}{b,B}{c,C}{d,D}{e,E}{f,F}{g,G}{H,h}{i,I})
declare -p e >> testdata

for impl in ./quinedb_*; do
  TIMEFORMAT="$impl:"$'\nreal\t%R\nuser\t%U\nsys\t%S\n'
  time for ((i=0; i<32; i++)); do
    bash "$impl" set "${e[i]}" "${j[i]}" > qdb.out || break
    mv qdb.out "$impl"
  done
done

testdata.txt

./quinedb_+=:
real    11.331
user    0.578
sys     10.609
DB IS LOSSY

./quinedb_pf:
real    10.937
user    0.328
sys     10.703

An example of byte-safety:

/tmp$ bash ./quinedb_pf get abcdefghi >/dev/null
$'\3257\327\354I\206\374\364\306\\\253\216,\234\005\370'
/tmp$ bash ./quinedb_+= get abcdefghi >/dev/null
$'7I\\'

Warning on newlines and grey-area undocumented behavior:

printf %q only gives reusable strings, and there is no guarantee that the string will not contain newlines. The current (4.4) implementation does not produce newlines though:

/* bash-4.4/builtins/printf.def:575 */
if (p && *p == 0)        /* XXX - getstr never returns null */
  xp = savestring ("''");
else if (ansic_shouldquote (p))
  xp = ansic_quote (p, 0, (int *)0); // isprint('\n') == 0, so it always hits this branch
else
  xp = sh_backslash_quote (p, 0, 1); // even this branch has NL covered in lib/sh/shquote.c:bstab
Artoria2e5 commented 8 years ago

Hmm, should kill b597667...

gfredericks commented 8 years ago

Would it be easy to post-process the results to replace newlines with \n?

Artoria2e5 commented 8 years ago

Would it be easy to post-process the results to replace newlines with \n?

Doing so correctly requires more understanding of the escapes by re-parsing it... Somehow worse than sticking to the current assumption that newlines always get escaped.