AnantLabs / google-enterprise-connector-sharepoint

Automatically exported from code.google.com/p/google-enterprise-connector-sharepoint
0 stars 0 forks source link

Connector fails to detect changes if there is a huge gap in the activity on the SharePoint Server, goes into an infinite loop. #113

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
* The connector uses the getListItemsSinceChangeToken web-service to get
the list of changes. 
* It saves the change token returned to the state file. 
* The token from the state-file is used in subsequent WS call to get
changes since that point. 
* The change token itself has a format described at
http://msdn.microsoft.com/en-us/library/bb417456.aspx
* SharePoint maintains a history of all changes in EventCache table; see:
http://msdn.microsoft.com/en-us/library/dd585124(office.11).aspx
* Entries in EventCache table have a limited lifetime.
* If there are no changes for a long time, all entries in EventCache table
get purged.
* As a result the change tokens stored in state-file have no matching
entries in the EventCache table.
* So the web-service succeeds but returns error "Invalid Token"
* Connector does not decipher this error, instead logs a message indicating
"null" change token is received, and proceeds to the next entry in the
state file.
* The above cycle of checking for changes with and invalid token for every
entry in the state file goes into an infinite look, and no changes are
detected.

Expected Behavior:
* The connector should interpret the "Invalid Token" error and take
corrective steps and also indicate the user to increase eventcache timeout
What steps will reproduce the problem?

Original issue reported on code.google.com by j.dars...@gmail.com on 31 Oct 2009 at 12:53

GoogleCodeExporter commented 9 years ago
Sample SOAP Request and Response which shows the problem:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:soap1="http://schemas.microsoft.com/sharepoint/soap/"> 
   <soap:Header/> 
   <soap:Body> 
      <soap1:GetListItemChangesSinceToken> 
         <soap1:listName>d</soap1:listName> 

<soap1:changeToken>1;3;f64ecbeb-354c-4349-ae65-542352177d5b;633900348287300000;3
7204</soap1:changeToken>

      </soap1:GetListItemChangesSinceToken> 
   </soap:Body> 
</soap:Envelope> 

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
   <soap:Body> 
      <GetListItemChangesSinceTokenResponse
xmlns="http://schemas.microsoft.com/sharepoint/soap/"> 
         <GetListItemChangesSinceTokenResult> 
            <listitems xmlns:rs="urn:schemas-microsoft-com:rowset"> 
               <Changes> 
                  <Id ChangeType="InvalidToken"/> 
               </Changes> 
               <rs:data ItemCount="0"></rs:data> 
            </listitems> 
         </GetListItemChangesSinceTokenResult> 
      </GetListItemChangesSinceTokenResponse> 
   </soap:Body> 
</soap:Envelope> 

Original comment by j.dars...@gmail.com on 31 Oct 2009 at 12:54

GoogleCodeExporter commented 9 years ago

Original comment by j.dars...@gmail.com on 31 Oct 2009 at 12:57

GoogleCodeExporter commented 9 years ago

Original comment by mwarti...@gmail.com on 6 Nov 2009 at 12:20

GoogleCodeExporter commented 9 years ago
After a site collection has reached a state in which its SharePoint change log 
is empty, the Connector will never pick up any changes made to the site 
collection except during a full traversal.

This needs to be fixed and it should at least automatically start a ful 
traversal for that Site Collection.

Another way to handle this can be to perform a time based query using:
"getChangesSince" <datetime> 
where:
datetime => (CurrentTimestamp - ChangeLogRetentionPeriod)

Original comment by darsh...@google.com on 29 Sep 2010 at 10:59

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
This problem has a significant impact on our ability to use the Google 
SharePoint Connector and GSA effectively.  We have a lot of site collections 
that we're trying to index and many site collections have no activity for 
extended periods of time.  When new documents are eventually uploaded to those 
site collections, if the SharePoint change logs have become empty, the new 
documents are never found by the Connector and the documents are never indexed 
by the GSA. The failure to index new documents causes significant problems for 
our users.

To work around the problem, we've increased the retention period for the 
SharePoint change log, but we cannot make that value too big or our content 
databases get too large.  We delete the Sharepoint_state.xml file periodically 
to do full traversals (to pick up documents that have been missed by the 
Connector), but our full traversals run for several days.  While a full 
traversal is in progress, newly-uploaded documents might not be indexed for 
several days, which again causes significant problems for our users.

Original comment by ron.hitc...@gmail.com on 5 Oct 2010 at 9:20

GoogleCodeExporter commented 9 years ago
This issue is filed as Google issue #6514001

Original comment by tdnguyen@google.com on 17 May 2012 at 11:28