Alfresco / SearchServices

Alfresco Search Services
GNU Lesser General Public License v3.0
31 stars 36 forks source link

soft timeout for long running text extractions #397

Open hi-ko opened 2 years ago

hi-ko commented 2 years ago

as discusse on Discord the new transformer framework degrades scalability/stability due to more long-running threads.

The only work around by today is to increase timeouts for the http client but that will pile up the number of threads which is not a good idea. e.g.

solr.http.socket.timeout=30000
solr.http.connection.timeout=10000

To fix this, the tracker or repo web script should support a soft timeout that offloads the threads and triggers a mechanism as discussed in #396 to mark a node so that it is not captured by the content tracker and that automatically restores visibility to the tracker once the content has been transformed by a T-Engine.