derekparker / trie

Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.
MIT License
748 stars 114 forks source link

High RAM usage #28

Open awnumar opened 5 years ago

awnumar commented 5 years ago

This package is pulled in as a dependency of one of my dependencies, and it's causing unusually high RAM usage. Related issue: https://github.com/prologic/bitcask/issues/67

This looks like a memory leak to me since it's unlikely the garbage collector ran in the short time my program was running, and during that time it managed to slurp up 13 GB of memory.

jsign commented 5 years ago

The Node struct isn't very lean, and considering multiple Node are created for a single .Add depending on the key length, this result may be reasonable.

type Node struct {
    val       rune
    path      string
    term      bool
    depth     int
    meta      interface{}
    mask      uint64
    parent    *Node
    children  map[rune]*Node
    termCount int
}

Doing a quick analysis in the NewChild method allocs (from a bench of Add):

  788.57MB   788.57MB (flat, cum) 97.83% of Total
         .          .    154:           path:     path,
         .          .    155:           mask:     bitmask,
         .          .    156:           term:     term,
         .          .    157:           meta:     meta,
         .          .    158:           parent:   parent,
  170.01MB   170.01MB    159:           children: make(map[rune]*Node),
  333.03MB   333.03MB    160:           depth:    parent.depth + 1,
         .          .    161:   }
  285.53MB   285.53MB    162:   parent.children[node.val] = node
         .          .    163:   parent.mask |= bitmask
         .          .    164:   return node
         .          .    165:}

(Raw numbers doesn't matter, but ratios)

Maybe modeling the children as a simple slice would be more memory efficient, even if getting elements to involve iterating (mem cache might even transform it into a faster solution).