baiwfg2 / awesome-readings

记录看各种文章、论文的心得
2 stars 0 forks source link

adaptive radix tree (ART) , 2013 #42

Open baiwfg2 opened 2 years ago

baiwfg2 commented 2 years ago

[1] https://github.com/armon/libart , C implementation [2] https://www.the-paper-trail.org/post/art-paper-notes/ [3] https://lwn.net/Articles/175432/ , linux radix tree use case [4] The ART of Practical Synchronization , 2016 [5] A Comparison of Adaptive Radix Trees and Hash Tables [6] TLB misses — the Missing Issue of Adaptive Radix Tree , 2015 [7] https://ivanzz1001.github.io/records/post/data-structure/2018/11/18/ds-radix-tree 很形象地描述了 radix tree [8] https://readpaper.com/pdf-annotate/note?noteId=668066120997765120&pdfId=4506447654598369282 我的个人笔记

baiwfg2 commented 2 years ago

[2] 对ART 论文做了解析

浪费点内存空间可能不是大问题,但却严重影响cache hit

Finally, by not early-exiting from the loop, we can hint to the compiler that it need not use a full branch, but can just use a conditional cmov predicated instruction


// Find the child in `node` that matches `c` by examining all child nodes, in parallel.
Node* find_child(char c, Node16* node) {
// key_vec is 16 repeated copies of the searched-for byte, one for every possible position
// in child_keys that needs to be searched.
__mm128i key_vec = _mm_set1_epi8(c);
    // Compare all child_keys to 'c' in parallel. Don't worry if some of the keys aren't valid,
    // we'll mask the results to only consider the valid ones below.
    __mm128i results = _mm_cmpeq_epi8(key_vec, node->child_keys);

    // Build a mask to select only the first node->num_children values from the comparison
    // (because the other values are meaningless)
    int mask = (1 << node->num_children) - 1;

    // Change the results of the comparison into a bitfield, masking off any invalid comparisons.
    int bitfield = _mm_movemask_epi8(results) & mask;

    // No match if there are no '1's in the bitfield.
    if (bitfield == 0) return NULL;

    // Find the index of the first '1' in the bitfield by counting the leading zeros.
    int idx = ctz(bitfield);

    return node->child_pointers[idx];
}

以上写法看着挺NB

> when occupancy is at least 49 children the wasted space is less significant (although not 0 by any stretch of the imagination).

当Node256 节点只有49个孩子时,有些浪费,但不是那么重要(我想它很快就会有更多孩子)

>  That is, once you get to the end of a key that’s in the trie, you can’t go any further - but what about keys that are prefixes of some other key? What about http://google.com and http://google.com/chrome?

这是啥意思

> The paper claims these are “well-known”, and therefore not novel, but they reduce the impact of having either strings with a unique suffix (lazy expansion) or a common prefix (path compression)

lazy expansion 不就是 radix tree 的特点吗

> Of course, when the key set is dense, most nodes will be Node256 instances, and so ART isn’t too different from an ordinary trie.

当数据key 的skewness 很大时,ART比起HT ,优势更加明显
baiwfg2 commented 2 years ago

[5] 提出了异议

  1. 第一次实现动态adaptive radix tree的并非ART,而是Judy Array,而ART 论文并未比较与Judy 的性能
  2. ART 论文比较与hash table性能时,用的是chained hash table,它不适合用于性能比较,较大的指针开销
  3. 近年来,cuckoo hashing 也不断被研究,所以原论文与hashing 之间的比较是不完备的
  4. hash 性能很大程度依赖hash function 的选择,原论文选择 Murmur ,不一定是好的,太强的robustness 反而是 overkill

Our experiments strongly indicate that neither ART nor Judy are competitive in terms of performance to well-engineered hash tables, and in the case of ART, sometimes, not even in terms of space.

可见论文里的evaluation ,真的只能是"看看",别太当真。也许作者就是抓住了一个局部的、可供检验其理论的结果,就夸大了结果的一般性。