Closed GoogleCodeExporter closed 9 years ago
thanks!
It would be really, really (really) helpful if we could collect, here, a good
variety
of use cases for this trie.
Original comment by kevin...@gmail.com
on 1 Oct 2007 at 7:51
We use it internally for a few cases.
1) An IP list. We have a KeyAnalyzer that analyzes ip addresses to store them
(and/or ranges of them) as compactly as possible. See:
https://www.limewire.org/fisheye/browse/~raw,r=1.18/limecvs/core/com/limegroup/g
nutel
la/filters/IPList.java .
2) Storing a Kademlia DHT's internal structure. This is probably a very specific
use-case, although it works very well. :)
Other use cases:
3) A very efficient (both in memory & CPU) dictionary. It could be used for
something like a cellphone's phonebook. You start typing and it immediately
finds
all strings that began with your string. At the same time the structure acts
like a
Map preventing you from adding the same String twice. It's kind of like a
super-
SortedMap.
I'm sure there's a ton of other cases, but those are the ones we use (and have
plans
to use). The really awesome things about Patricia is the way it does lookups.
Consider the IPList -- IPv4 addresses have a fixed size of 32 bits. So no
matter
how many items you have stored in the Trie, it will always do at most 32 bit
comparisons. And it will often do less (because it intelligently traverses the
tree), and it will keep things sorted and tell you all addresses that begin
with
X.Y, etc...
Original comment by sberlin
on 1 Oct 2007 at 8:10
Great stuff, folks. Thanks so much.
With the sheer mountain of work involved in getting our existing stuff
polished, I'm
afraid it may be some time before I can look at breaking into new territory.
But, I
do think that this trie and tries in general are a very promising addition, so
don't
lose hope!
Original comment by kevin...@gmail.com
on 16 Oct 2007 at 3:37
Disclaimer: I don't know much about Tries.
I can imagine that many use cases for a Trie break into at least two categories
1. The SortedMap/NavigableMap the right API abstraction that the caller wants,
but
due to the nature of the data, a Trie can be much faster than existing
implementations of these interfaces.
2. What the caller wants to see doesn't look like a Map at all, but something
much
simpler that deals only with prefix-matching like (rough guess):
public interface Trie<N, V> {
boolean containsMatch(List<? extends N> sequence);
V get(List<? extends N> sequence); // doesn't require exact match
void put(List<? extends N> sequencePrefix, V value);
}
A separate issue: should we consider providing an alternate interface that is
tailored to character-based tries, so this alternate API could use CharSequence
in
place of List<Character>? Then there would be two simple methods
Tries.asCharacterTrie() and Tries.forCharacterTrie() to go back and forth
between the
two.
Original comment by kevin...@gmail.com
on 23 Oct 2007 at 4:29
From my experience, #1 is exactly right. A Trie easily doubles as a
super-efficient
SortedMap/NavigableMap.
You hit the nail on #2 with the CharSequence version. Trie's I've encountered
function with a similar interface, but tailored towards the combined sequence
instead of the pieces that build up the sequence. That is, the methods would
look
like:
public interface Trie<K, V> {
boolean hasPrefix(K prefix);
SortedMap<K, V> getPrefixedBy(K prefix); // returns a view over all entries
prefixed by K
void put(K key, V value);
}
The difference is the Trie is directly acting off a K value, which would be the
CharSequence to a Character, or a NumberSequence to a Number. This has the
advantage of being able to reuse the interface for arbitrary-length keys or
fixed-
length keys whose contents don't divide easily into objects. (For example, IP
addresses being composed of bits, CharSequences, phone numbers, etc...)
The Trie interface in the submission includes a few additional convenience
methods
and directly subclasses SortedMap (if it were targetted for 1.6, it would
extend
NavigableMap too). The additions are for better use with fixed-size keys and
being
able to visit the entries in the map while traversing (with a 'Cursor').
Original comment by sberlin
on 23 Oct 2007 at 5:49
(Really, that getPrefixedBy could just return a List<V> - having it return a vw
of
the map itself enables some really cool things.)
Original comment by sberlin
on 23 Oct 2007 at 5:50
Original comment by kevin...@gmail.com
on 3 Nov 2007 at 5:32
Original comment by kevin...@gmail.com
on 2 Jun 2008 at 5:48
Hello,
currently I'm wirting something for my studies and did a bit of research about
Tries.
I fall upon a Hash Table performance like Trie called HAT-Trie.
It's a proposal for a fast memory aware Trie that can deal with non equal
memory costs.
http://crpit.com/confpapers/CRPITV62Askitis.pdf
In my opinion it maybe would be a great advantage to use this Trie.
Hope I could help,
Tim
Original comment by tim.frey...@googlemail.com
on 22 Mar 2009 at 10:31
To completely generalize it, a Trie is kind of a particular way of representing
a
Set< List< E > >
or, easily enough, a
Map< List< E >, V >
where E is typically a character (List<E> is a String). The representation
allows
certain specific operations to be very efficient, and is efficient for storage
when
there are many common prefixes among the List<E> (a spelling dictionary, for
example).
There may be some benefit to completely abstracting it in this way, although
translating the common use case of a String key would be an excessive amount of
overhead.
Original comment by ray.a.co...@gmail.com
on 3 Aug 2009 at 5:53
As a use case, I've used it for finding phrases within a large text document. I
have
some set of phrases that I'm looking for (not Strings, but sequences of words,
hence
- List<String>) stored as a Trie. To search, I walk the normalized text (split
on
whitespace, punctuation removed, handle case insensitivity, etc.) which is also
a
List<String>, keeping track of possible matches as I go.
Generalizing, this is finding all possible subsequences of List<E> (the text)
that
are members of a given Set<List<E>> (the search phrases).
The big benefit is that the text, which can be very large, only needs to be
traversed
once to find any one (or all) of thousands of possible phrases. It scales very
well.
Original comment by ray.a.co...@gmail.com
on 3 Aug 2009 at 6:05
I have the same use case as Ray.a.conner, and I'd also use it for scalable
tab|auto-
completion in interactive consoles|text fields.
Original comment by cresw...@gmail.com
on 3 Aug 2009 at 6:31
Are you folks using the attached PatriciaTrie implementation, or a separate one?
Original comment by sberlin
on 3 Aug 2009 at 6:34
The Google code base includes multiple Java trie implementations. If we chose
to add
a trie to the library, we'd probably start with one of those.
Original comment by jared.l....@gmail.com
on 3 Aug 2009 at 6:44
I rolled my own implementation, some years ago. I wasn't storing simple
strings, and
pretty much everything out there was character-centric. Once I made the
abstraction
to sequences of strings, sequences of objects was a no-brainer. Although I never
needed to implement many of the Trie operations that others find useful; I only
needed sub-sequence matching.
Original comment by ray.a.co...@gmail.com
on 3 Aug 2009 at 8:13
Original comment by kevin...@gmail.com
on 17 Sep 2009 at 6:02
Original comment by master.j...@gmail.com
on 24 Nov 2009 at 8:06
Attachments:
Moved to http://code.google.com/p/guava-libraries/issues/detail?id=10
Original comment by kevinb@google.com
on 5 Jan 2010 at 11:10
i want linked list implementation of trie datastructure can you suggest
Original comment by chaith...@gmail.com
on 19 Mar 2010 at 1:48
Hi, tim.frey.online. Do you have an implementation of HAT-trie?
Original comment by tiberiut...@gmail.com
on 20 Jun 2010 at 7:49
Original issue reported on code.google.com by
sberlin
on 24 Sep 2007 at 6:12Attachments: