PILLUTLAAVINASH / google-enterprise-connector-manager

Automatically exported from code.google.com/p/google-enterprise-connector-manager
0 stars 0 forks source link

determineBatchHint messages spamming logs once per second. #187

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
While examining the logs after a test, I noticed an annoying
behaviour:

... feed 100 documents ...
May 28, 2009 12:02:33 PM [Traverse dmobius] 
com.google.enterprise.connector.pusher.DocPusher take
FINER: Document 0901081d80004088 from connector dmobius sent.
May 28, 2009 12:02:33 PM [Traverse dmobius] 
com.google.enterprise.connector.traversal.QueryTraverser checkpointAndSave
FINEST: Checkpointing for connector dmobius ...
May 28, 2009 12:02:33 PM [Traverse dmobius] 
com.google.enterprise.connector.dctm.Checkpoint asString
FINE: Created Checkpoint: 
{"uuid":"0901081d80004088","lastRemoveDate":"2009-05-21 
16:57:35","lastModified":"2009-05-23 21:10:28"}
May 28, 2009 12:02:33 PM [Traverse dmobius] 
com.google.enterprise.connector.traversal.QueryTraverser checkpointAndSave
FINEST: ...checkpoint {"uuid":"0901081d80004088","lastRemoveDate":"2009-05-21 
16:57:35","lastModified":"2009-05-23 21:10:28"} created.
May 28, 2009 12:02:33 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.CancelableBatch run
FINE: Traverser dmobius batchDone with result = BatchResult: delayPolicy = 
IMMEDIATE 
countProcessed = 100
May 28, 2009 12:02:33 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:35 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:36 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:37 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:38 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:39 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:40 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:41 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:42 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:43 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
May 28, 2009 12:02:44 PM [Traverse dmobius] 
com.google.enterprise.connector.scheduler.HostLoadManager determineBatchHint
FINEST: connectorName = dmobius  maxDocsPerPeriod = 100  docsTraversed = 100  
remainingDocsToTraverse = 0
... spam the logs once per second saying we have nothing to do ...

Basically, the connector has fed 100 documents, hitting the
default maximum docs-per-minute hostload.  It fed its 100
documents in 15 seconds, so it waits 45 seconds before it
can resume feeding.  In CM revision r1562, I incorporated
some of the logic from HostLoadManager.determinBatchHint()
into HostLoadManager.shouldDelay(), and moved the call to
determineBatchHint() until after the call to shouldDelay().

This avoids determineBatchHint() filling up the logs with
hints of 0 once per second, and shouldDelay() does its job
better.

Some analysis reveals the problem is related to asynchronous
worker threads.  The check for "is this connector already 
processing a batch?" was moved down into 
ConnectorCoordinatorImpl.startBatch(), after the call to
determineBatchHint().  So for every second that the 
connector is processing a batch, a determineBatchHint()
message gets added to the log.

Original issue reported on code.google.com by Brett.Mi...@gmail.com on 6 Oct 2009 at 9:37

GoogleCodeExporter commented 8 years ago

Original comment by jl1615@gmail.com on 21 Oct 2009 at 7:15

GoogleCodeExporter commented 8 years ago
The quick and easy (read ugly) way to fix this was to create
a subclass of BatchSize that lazily evaluates the BatchSize,
waiting until one of the getters are called.  This maintains
the calling convention of TraversalScheduler passing the
BatchSize down to the ConnectorCoordinator, but delays calling
HostLoadManager.determineBatchSize() until after the Coordinator
as determined that it is OK to start a new batch.

The real way to fix this is to move host load management for
a connector instance into the ConnectorCoordinator.  Once that
is done, the BatchSize does not need to get passed down from
the Scheduler to the Coordinator.  See Issue 189.

Original comment by Brett.Mi...@gmail.com on 21 Oct 2009 at 10:19