Full-text search: IllegalArgumentException: Field text longer than maximum length %d [1048576]

Dan: It should be easy for us to detect if the text we're about to feed into 
the index exceeds the size limit.  What should we do in such cases?  Truncate 
the text?  Try to remove duplicate words?  Split it across multiple fields 
(assuming the limit is per-field, not per-document)?  The current behavior is 
to keep trying until all quota is used up, which is not great.  (At least this 
is what happened when the re-indexing MR ran into such a wave on my test 
instance.)

Disallowing waves with that much text is also an option but has the 
disadvantage that we won't be able to import waves that exceed our limit.

What steps will reproduce the problem?
1. Create or import a wave with lots and lots of text

What is the expected output?  What do you see instead?
indexing fails with: IllegalArgumentException: Field text longer than maximum 
length %d [1048576]

What browser and browser version are you using?  On what operating system?
Chrome Mac

What URL does your browser show when the problem occurs?  Did you compile
walkaround on your machine, or are you using a public instance?
full-text-search branch

Original issue reported on code.google.com by oh...@google.com on 25 Jan 2012 at 2:06

byMan / walkaround

Full-text search: IllegalArgumentException: Field text longer than maximum length %d [1048576] #51