AnantLabs / google-enterprise-connector-sharepoint

Automatically exported from code.google.com/p/google-enterprise-connector-sharepoint
0 stars 0 forks source link

Upgrade HTTPCLient library and AXis to newer versions #139

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is in relation to Connector Manager Issue 112.

The SharePoint Connector uses HTTPCLient library v3. The 
AutoCloseInputStream  implementation from HTTPClient library results in 
IOException in certain cases when pulling content of a document.

Recommendation:
 Move to HTTPCLient library v4.x

Email snippet:
====================
I just heard back from the Apache Commons maintainer on my bug:

Oleg Kalnichevski resolved HTTPCLIENT-910.
------------------------------------------

   Resolution: Won't Fix

Brett,

What you are saying makes good sense. However, AutoCloseInputStream 
is no longer used in the HttpClient 4.x code line and I see no point
in fixing HttpClient 3.x, which is pretty much end of life. 

Please consider upgrading.

Oleg

So it looks like the problem goes away if the Sharepoint Connector
moves to httpclient 4.x.  I don't know if it is a drop-in replacement,
but I like that better than changing either the Connector or the
Connector Manager source code to fix the problem.

--
Brett M. Johnson

Original issue reported on code.google.com by rakeshs101981@gmail.com on 29 Jan 2010 at 4:19

GoogleCodeExporter commented 9 years ago
Also consider upgrading to Axis 2.

Currently the HTTPCLient library v3 has been modified for NTLM v2 and Kerberos 
authentication schemes.

Original comment by rakeshs101981@gmail.com on 29 Jan 2010 at 4:22

GoogleCodeExporter commented 9 years ago

Original comment by rakeshs101981@gmail.com on 29 Jan 2010 at 4:22

GoogleCodeExporter commented 9 years ago
The correct issue is Connector Manager Issue 212

Original comment by rakeshs101981@gmail.com on 29 Jan 2010 at 4:24

GoogleCodeExporter commented 9 years ago
As described here: http://hc.apache.org/httpclient-3.x/performance.html

"HttpClient is capable of efficient request/response body streaming. Large 
entities 
may be submitted or received without being buffered in memory. This is 
especially 
critical if multiple HTTP methods may be executed concurrently. While there are 
convenience methods to deal with entities such as strings or byte arrays, their 
use 
is discouraged. Unless used carefully they can easily lead to out of memory 
conditions, since they imply buffering of the complete entity in memory."

Hence the issue seems to be more related to the environment, where the 
connection is 
closed pre-maturely for various reasons known to HTTClient library. The problem 
is 
not repeated for content crawled from all SharePoint servers, but only onw 
SharePoint server, which supposedly seems to be serving slow connections

Original comment by rakeshs101981@gmail.com on 11 Feb 2010 at 8:36

GoogleCodeExporter commented 9 years ago
Works fine with another SharePoint installation with more than 160k docs.

One observation is that this exception occurs whenever 
the "java.net.SocketException: No buffer space available (maximum connections 
reached?): JVM_Bind" occurs as reported in Issue 59

So the exception might be triggered by following sequence:

1. The SharePoint serving is slow in serving content
2. The connector soon exhausts the max sockets available for establishing 
connection 
resulting in JVM_Bind exception
3. This might trigger HTTPCLient library to close existing connections
4. The read of any such closed connection will throe an IOException

Original comment by rakeshs101981@gmail.com on 11 Feb 2010 at 8:58

GoogleCodeExporter commented 9 years ago
I am trying to understand how the Connector is running out of sockets.

Are you maintaining open InputStreams for all items in the returned 
DocumentList (2000 open InputStreams)?

Are you opening a new InputStream for a Document's content upon a call to 
nextDocument (or when fetching 
the content Property for that Document)?

Does it seem that the Document's content InputStream.close() method doesn't 
ever get called (except by 
AutoClose)?

Are many Traversal batches timing out, getting cancelled, then leaking an open 
connection to the Sharepoint 
server?

Are you running a large number (> 10) of concurrent Connector instances in the 
same Connector Manager?

Is it possible that it is the Sharepoint Server that has run out of sockets 
(rather than the Connector client)?

Original comment by Brett.Mi...@gmail.com on 11 Feb 2010 at 7:09

GoogleCodeExporter commented 9 years ago
The conenctor just hands over an inputstream to the CM. Its opened only when CM 
calls findProperty("google:content") or findProperty("google:mimetype").

The exception occurs because AutoCloseInputStream.close() was called. This can 
happen when the HTTPConnection itself will be closed. 

No batches are timing out or getting cancelled.

The one consistent pattern that I have been able to ascertain for logs from 
every 
run is, JVM_BIND is usually present. The error is reproduced with only one 
connector 
instance. The same connector works fine if configured for some other SP server. 

The problem does not occur for all documents, but few documents and after 
sometime 
documents are fed succesfully

The above hypothesis is based on all these observations

Original comment by rakeshs101981@gmail.com on 12 Feb 2010 at 1:43

GoogleCodeExporter commented 9 years ago

Original comment by rakeshs101981@gmail.com on 20 May 2010 at 6:20

GoogleCodeExporter commented 9 years ago

Original comment by rakeshs101981@gmail.com on 6 Oct 2010 at 7:50

GoogleCodeExporter commented 9 years ago
This issue is filed as Google issue #6513766

Original comment by tdnguyen@google.com on 18 May 2012 at 12:12