Closed himmelmrm closed 6 years ago
Do you have a narrowed down to a certain set of suspect nodes? Typically this happens for syncing batches containing nodes of considerable size ( e.g DAM nodes ). A way to manage this is to use the batchSize parameter to lower the batch size for paths containing large nodes that take time to steam.
so ... the 60 second timeout applies to the batch or to each individual node being transferred?
It all happens in a pipeline, things get processed batch by batch. One step reads data from the server, until a certain amount of "nodes" reach the batch size; then it hands the batch off to the writer step (which you are experiencing issues in).
If the reader spends too much time reading in enough nodes to satisfy the batch, while the writer finishes up with the previous batch - you will see this if the writer twiddles it's thumbs for 60 seconds afterwards, waiting for the next batch from the reader.
The IO timeout can't be configured currently, but it may be possible to achieve the same thing by lowering the number of nodes that satisfy the batch - optimizing for latency, rather than throughput.
That said, I'm sure the timeout configuration would be an appreciated improvement, PRs are welcome!
Does deleteBeforeWrite factor in? With deleteBeforeWrite=true, could a large tree cause a Timeout?
Actually, @himmelmrm -- I think you are running into Jetty's configured timeout ... I remember seeing this before when deleteBeforeWrite=true
as you eluded ..
It is configurable under org.apache.felix.http
.. it can time out if you are deleting a large path using
thedeleteBeforeWrite
feature.
@sagarsane really? Because deleteBeforeWrite happens before a connection is made? I guess either way, if you can configure the timeout, that should help.
<batch:step id="deleteBeforeWrite" next="startHttpConnection">
<batch:tasklet ref="deleteBeforeWriteTasklet" transaction-manager="clientTransactionManager"/>
</batch:step>
thanks @jbornemann and @sagarsane --- I've been able to mostly eliminate this issue by increasing the Jetty Connection Timeout value on the server side.
Ok great. Thanks @himmelmrm !
I frequently experience job failures due to TimeoutException.
Here's the server error:
31.10.2017 15:25:37.957 *ERROR* [192.168.5.1 [1509477849521] GET /grabbit/content HTTP/1.1] com.twcable.grabbit.server.batch.steps.jcrnodes.JcrNodesWriter Exception occurred while writing the current chunk org.apache.sling.engine.impl.helper.ClientAbortException: java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 60000/60000 ms
Sometimes it takes 3 or 4 attempts to get a complete transfer.This is happening between two systems on the same network.
Thanks!