PILLUTLAAVINASH / google-enterprise-connector-manager

Automatically exported from code.google.com/p/google-enterprise-connector-manager
0 stars 0 forks source link

Large documents lead to an infinite loop in the traversal. #32

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Add some large documents to a repository (10-30 MB).
2. Traverse the repository using the GSA

What is the expected output? What do you see instead?

The documents are not indexed. During the retrieval, Base64-encoding, and
URL-encoding of the content the worker thread times out and is replaced.
There are log entries indicating that WorkQueue.replaceHangingThread is
called. The traversal is resumed from the previous checkpoint, leading to
an infinite loop.

Please use labels and text to provide additional information.

I think that the timeout is 60 seconds. On my machine, a 30 MB document
takes about 10 seconds to download using the otex.FileContentHandler. Using
otex.HttpURLContentHandler only takes 4 seconds, but we then run into issue
4, where Base64FilterInputStream assumes a fully-available underlying
stream. Most of the time is spent encoding the content (issue 27), and I
don't know whether it's the individual document or the batch that is timing
out.

Original issue reported on code.google.com by jl1615@gmail.com on 15 Feb 2007 at 1:14

GoogleCodeExporter commented 8 years ago
Google Bug #244812

Original comment by vjo...@gmail.com on 20 Feb 2007 at 5:27

GoogleCodeExporter commented 8 years ago
Fixed in r491.

Original comment by mgron...@gmail.com on 3 Oct 2007 at 10:24

GoogleCodeExporter commented 8 years ago

Original comment by mgron...@gmail.com on 3 Oct 2007 at 11:02