Open baiwfg2 opened 2 years ago
浪费点内存空间可能不是大问题,但却严重影响cache hit
Finally, by not early-exiting from the loop, we can hint to the compiler that it need not use a full branch, but can just use a conditional cmov predicated instruction
// Find the child in `node` that matches `c` by examining all child nodes, in parallel. Node* find_child(char c, Node16* node) { // key_vec is 16 repeated copies of the searched-for byte, one for every possible position // in child_keys that needs to be searched. __mm128i key_vec = _mm_set1_epi8(c);
// Compare all child_keys to 'c' in parallel. Don't worry if some of the keys aren't valid,
// we'll mask the results to only consider the valid ones below.
__mm128i results = _mm_cmpeq_epi8(key_vec, node->child_keys);
// Build a mask to select only the first node->num_children values from the comparison
// (because the other values are meaningless)
int mask = (1 << node->num_children) - 1;
// Change the results of the comparison into a bitfield, masking off any invalid comparisons.
int bitfield = _mm_movemask_epi8(results) & mask;
// No match if there are no '1's in the bitfield.
if (bitfield == 0) return NULL;
// Find the index of the first '1' in the bitfield by counting the leading zeros.
int idx = ctz(bitfield);
return node->child_pointers[idx];
}
以上写法看着挺NB
> when occupancy is at least 49 children the wasted space is less significant (although not 0 by any stretch of the imagination).
当Node256 节点只有49个孩子时,有些浪费,但不是那么重要(我想它很快就会有更多孩子)
> That is, once you get to the end of a key that’s in the trie, you can’t go any further - but what about keys that are prefixes of some other key? What about http://google.com and http://google.com/chrome?
这是啥意思
> The paper claims these are “well-known”, and therefore not novel, but they reduce the impact of having either strings with a unique suffix (lazy expansion) or a common prefix (path compression)
lazy expansion 不就是 radix tree 的特点吗
> Of course, when the key set is dense, most nodes will be Node256 instances, and so ART isn’t too different from an ordinary trie.
当数据key 的skewness 很大时,ART比起HT ,优势更加明显
[5] 提出了异议
Our experiments strongly indicate that neither ART nor Judy are competitive in terms of performance to well-engineered hash tables, and in the case of ART, sometimes, not even in terms of space.
可见论文里的evaluation ,真的只能是"看看",别太当真。也许作者就是抓住了一个局部的、可供检验其理论的结果,就夸大了结果的一般性。
[1] https://github.com/armon/libart , C implementation [2] https://www.the-paper-trail.org/post/art-paper-notes/ [3] https://lwn.net/Articles/175432/ , linux radix tree use case [4] The ART of Practical Synchronization , 2016 [5] A Comparison of Adaptive Radix Trees and Hash Tables [6] TLB misses — the Missing Issue of Adaptive Radix Tree , 2015 [7] https://ivanzz1001.github.io/records/post/data-structure/2018/11/18/ds-radix-tree 很形象地描述了 radix tree [8] https://readpaper.com/pdf-annotate/note?noteId=668066120997765120&pdfId=4506447654598369282 我的个人笔记