Top-Q / difido-reports

This project aims to provide a generic implementation for HTML test reports.
http://top-q.github.io/difido-reports
Apache License 2.0
46 stars 30 forks source link

Difido TaskRejectedException #247

Open KobyTetro opened 3 years ago

KobyTetro commented 3 years ago

when difido is handling too many tasks it gets this error:

error:

ERROR 1250 --- [http-nio-0.0.0.0-9000-exec-32] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [org.springframework.core.task.TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@6fd2caff[Running, pool size = 1, active threads = 1, queued tasks = 100000, completed tasks = 755820 0]] did not accept task: org.springframework.aop.interceptor.AsyncExecutionInterceptor$1@c1b0910] with root cause

in the source code it says not to raise the number of threads

version:

`Manifest-Version: 1.0
Implementation-Title: difido-server
Implementation-Version: 2.2.04
Archiver-Version: Plexus Archiver
Built-By: agmon
Start-Class: il.co.topq.report.Application
Implementation-Vendor-Id: il.co.topq.report
Spring-Boot-Version: 1.3.1.RELEASE
Created-By: Apache Maven 3.5.3
Build-Jdk: 1.8.0_171`
itaiag commented 3 years ago

The queue size is set to 100000. You may raise this number to 500000 and more, but, if the server got to more then 100000 operations that are pending, then my guess is that you are having some problem with your IO. Even when I stress test the server, I never get near this number, and I'm running scenarios that are not realistic in any way. Maybe you are using some kind of a NAS? If you do then I suggest changing the location of the docRoot folder to a local folder and see if it solves the problem.

Another option is to reduce the number of calls that are performed to the server. In most binders this can easily achieve this by editing the difido.properties file and increasing the value of the min.time.between.writes property. This will change the minimum time in milliseconds that updates are done to the server. You may experience some delays in the refresh intervals of the reports but it can reduce the IO and network usage significantly.

hengoldburd commented 3 years ago

There is no NAS there. this worker contain all persistency jobs? including: delete old reports and all the live api calls (reports, new exec etc.) ??

BTW we did not implement min.time.between.writes in python binder.... maybe this is the time to do so

itaiag commented 3 years ago

Hi Benadam!

The worker is responsible for all operations done on the file system. The majority of it is updating the json files of the executions and tests. You can see it as a kind of a shock absorber for smoothing the operation.

Not having the min.time.between.writes can add more stress on the IO and it is a good idea to implement it in the binder. It is not even have to be configurable. In the official binders it is set to 100ms by default and it good enough for most cases, but still, 100,000 operations waiting in queue it's a lot. It is like you are writing to a floppy drive J.

By the way, I'm really interested in your Python binder! and also, you have my number ;)