AnantLabs / google-enterprise-connector-dctm

Automatically exported from code.google.com/p/google-enterprise-connector-dctm
Apache License 2.0
0 stars 0 forks source link

DctmDocumentList.checkpoint may throw a DfRuntimeException #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This can happen if the call to DmSession.getObject throws a DfRuntimeException. 
A DfException 
will get wrapped in a RepositoryDocumentException, but a DfRuntimeException 
will not. 

The connector manager will still skip the document, but if the document happens 
to be the last 
document in the batch, then it will be the current document when checkpoint is 
called. The 
current checkpoint code eventually calls DmSession.getObject again, which may 
fail again (and 
has). With no checkpoint the batch will be repeated, possibly infinitely.

I think we should be fetching the checkpoint data directly from the 
DmCollection, rather than 
going back to the server. We may even want to cache it early, to avoid any 
chance of problems 
fetching data from the collection after errors have occurred.

It's possible that we should also wrap DfRuntimeException as a 
RepositoryDocumentException. 
We need to discriminate between document failures and server failures in some 
way, possibly 
through a ping mechanism. That same logic may also apply to the existing 
wrapping of 
DfException.

Original issue reported on code.google.com by jl1615@gmail.com on 12 Feb 2009 at 9:15

GoogleCodeExporter commented 9 years ago

Original comment by jl1615@gmail.com on 23 May 2009 at 12:00

GoogleCodeExporter commented 9 years ago
Fixed 29 May 2009 in Documentum Connector revision r560

I replaced the quite convoluted checkpoint management code with
a new Checkpoint class (derived from the Livelink connector).
The Checkpoint class can parse and format JSON syntax checkoint
strings.  It supports direct access to the checkpoint datum
(objectIds, and timeStamps), and supports roll-back, allowing
a checkpoint to be reverted to the previous state.

I then modified DctmTraversalManager and DctmDocumentList to use
the new Checkpoint object rather than the extensive JSON mashing
they were doing in the past.  The simplified checkpoint management
dropped nearly 200 lines of code from each of these two classes.

Finally I added a "transient error" check when catching thrown
RepositoryExceptions in a few key locations.  By "pinging" the
server, via a call Session.isConnected(), we make a rough guess
as to whether an execption is the result of lost connectivity
with the server.  If this is the case, we roll-back the current
Checkpoint, forcing the document that was being processed at
the time of the failure to be retried at a later time.

Original comment by Brett.Mi...@gmail.com on 31 May 2009 at 2:55

GoogleCodeExporter commented 9 years ago

Original comment by jl1615@gmail.com on 15 Jun 2009 at 8:19