PILLUTLAAVINASH / google-enterprise-connector-manager

Automatically exported from code.google.com/p/google-enterprise-connector-manager
0 stars 0 forks source link

null checkpoints are mishandled #94

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Write a connector that returns null from DocumentList.checkpoint. 
(SharePoint is an example)
2. Create a connector instance.
3. Look at the calls to startTraversal and resumeTraversal.

What is the expected output? What do you see instead?

The expected behavior is for the Connector Manager to call startTraversal once, 
and then store 
the checkpoint, and call resumeTraversal for subsequent batches, passing in the 
checkpoint 
value. But a null checkpoint is treated differently. It is not stored, and so 
all subsequent batches 
will also call startTraversal.

What version of the product are you using? On what operating system?

Connector Manager 1.0.3.

Please provide any additional information below.

There's a mismatch between null values returned by DocumentList.checkpoint and 
null values 
returned by ConnectorStateStore.getConnectorState. The former is undocumented, 
as far as I can 
tell, but it used to be documented (see revision 577 of TraversalManager.java). 
It essentially 
means that the connector is responsible for maintaining the state.

A null value from getConnectorState means that startTraversal will be called. 
With this mismatch, 
the CM erroneously always calls startTraversal when a connector's checkpoint 
method returns 
null. This leads to a bug in the SharePoint connector. Since startTraversal is 
always called, and 
resumeTraversal is never called, the connector always uses its internal state, 
and the CM has no 
way to force a restart of the traversal.

I think that we should either disallow null values from checkpoint, or 
distinguish between a null 
value from checkpoint and no checkpoint in the ConnectorStateStore. We need to 
be careful with 
the latter, because of the contract that the value returned by checkpoint will 
be passed to 
resumeTraversal. If we allow null checkpoints, we need to be sure that we treat 
null, "", and "null" 
differently in the store, and preserve each for the call to resumeTraversal.

Original issue reported on code.google.com by jl1615@gmail.com on 28 May 2008 at 11:24

GoogleCodeExporter commented 8 years ago
SPI change for the next release.

Original comment by jl1615@gmail.com on 9 Dec 2008 at 1:29

GoogleCodeExporter commented 8 years ago
The current implementation is just that a null value from checkpoint means the 
same as throwing an exception: 
the connector was unable to produce a checkpoint, and the batch will be 
repeated. If the previous state was null, 
then the call to startTraversal will be repeated. Since null is not a 
documented return value, we're just going to 
document the existing behavior in the DocumentList.checkpoint doc comment.

Original comment by jl1615@gmail.com on 7 Mar 2009 at 3:45

GoogleCodeExporter commented 8 years ago

Original comment by jl1615@gmail.com on 10 Apr 2009 at 8:20

GoogleCodeExporter commented 8 years ago

Original comment by mgron...@gmail.com on 6 May 2009 at 9:49

GoogleCodeExporter commented 8 years ago
Fixed in r1934. Documented the existing behavior: returning null is the same as 
throwing an exception. The 
checkpoint is not updated, and the batch will be repeated.

Original comment by jl1615@gmail.com on 16 May 2009 at 1:00