java.net.SocketTimeoutException and task interrupted

davidediruscio commented 4 years ago

Project: https://gitlab.ow2.org/sat4j/sat4j Time Span : 2018 Metrics : narrowed to the list below (eg. our Project Scenario).

As an user, I would expect the system not to stop the task but at least to retry X times the last operation or something like that before throwing in the towel.

pScenarioMetricProvidersIds = ['org.eclipse.scava.metricprovider.historic.bugs.users.UsersHistoricMetricProvider',
                          'org.eclipse.scava.metricprovider.historic.bugs.responsetime.ResponseTimeHistoricMetricProvider',
                          'org.eclipse.scava.metricprovider.historic.bugs.sentiment.SentimentHistoricMetricProvider',
                          'org.eclipse.scava.metricprovider.historic.bugs.opentime.OpenTimeHistoricMetricProvider',
                          'org.eclipse.scava.metricprovider.historic.bugs.emotions.EmotionsHistoricMetricProvider',
                          'org.eclipse.scava.metricprovider.historic.bugs.unansweredbugs.UnansweredThreadsHistoricMetricProvider',
                          'rascal.generic.churn.commitsToday.historic',
                          'trans.rascal.OO.java.DIT-Java-Quartiles.historic',
                          'trans.rascal.OO.java.LCC-Java-Quartiles.historic',
                          'trans.rascal.OO.java.LCOM4-Java-Quartiles.historic',
                          'trans.rascal.LOC.genericLOCoverFiles.historic',
                          'trans.rascal.OO.java.MHF-Java.historic',
                          'trans.rascal.OO.java.PF-Java.historic',
                          'trans.rascal.OO.java.TCC-Java-Quartiles.historic',
                          'rascal.testability.java.TestCoverage.historic']

Exception:

ERROR [ProjectDelta (sat4j,20180401)] (16:57:35): Delta creation failed.
java.net.SocketTimeoutException: timeout
        at okio.Okio$4.newTimeoutException(Okio.java:232)
        at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:355)
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:227)
        at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
        at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at org.eclipse.scava.platform.bugtrackingsystem.gitlab.GitLabManager$1.intercept(GitLabManager.java:346)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
        at okhttp3.RealCall.execute(RealCall.java:77)
        at org.eclipse.scava.platform.bugtrackingsystem.gitlab.GitLabManager.getComments(GitLabManager.java:284)
        at org.eclipse.scava.platform.bugtrackingsystem.gitlab.GitLabManager.getDelta(GitLabManager.java:112)
        at org.eclipse.scava.platform.bugtrackingsystem.gitlab.GitLabManager.getDelta(GitLabManager.java:1)
        at org.eclipse.scava.platform.delta.bugtrackingsystem.PlatformBugTrackingSystemManager.getDelta(PlatformBugTrackingSys
temManager.java:100)
        at org.eclipse.scava.platform.delta.bugtrackingsystem.BugTrackingSystemProjectDelta.<init>(BugTrackingSystemProjectDel
ta.java:32)
        at org.eclipse.scava.platform.delta.ProjectDelta.create(ProjectDelta.java:65)
        at org.eclipse.scava.platform.osgi.analysis.ProjectAnalyser.executeAnalyse(ProjectAnalyser.java:104)
        at org.eclipse.scava.platform.osgi.services.WorkerExecutor.run(WorkerExecutor.java:55)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:204)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
        at sun.security.ssl.InputRecord.read(InputRecord.java:503)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
        at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
        at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
        at okio.Okio$2.read(Okio.java:140)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
        ... 33 more
ERROR [ProjectExecutor (w1:sat4j:projectScenario)] (16:57:35): Project delta creation failed. Aborting.
INFO  [WorkerService] (16:57:35): Completing AnalysisTask on Worker 'w1'
INFO  [WorkerService] (16:57:35): Completing  AnalysisTask 'sat4j:projectScenario' on Worker 'w1' done
INFO  [WorkerExecutor] (16:57:45): Worker 'w1' Waiting new Tasks

davidediruscio commented 4 years ago

in addition @ambpro : the task is showing "completed" with 100% which is not true. It should be catched at some point.

davidediruscio commented 4 years ago

I think it has no relation with the heart of this issue but I just realized that the metric provider array above doesn't include the related needed dependencies...

davidediruscio commented 4 years ago

The default time out period for OKHTTP is 10 seconds. In commit d7444f9df3c48c480412d30d9235beb959d4d631 I increased the period of time before a timeout is triggered from 10 seconds to:

30 Seconds for Connection
1 Minute for Read
1 Minute for Write

Let me know if this change helps with this issue.

davidediruscio commented 4 years ago

Thanks @Danny2097. the timeout delay is one thing. Now, as mentioned in my initial comment, what's done after it has been triggered ? Is there a retry ?

davidediruscio commented 4 years ago

The metrics providers above still incomplete. Within the metric-platform, each metric could have or not a set of dependencies: take a look on the http://scava-dev.ow2.org:8182/analysis/metricproviders. Right now, the validation of the dependencies is done on admin-ui front end and not on the backend endpoint to create a new task /analysis/task/create.

By looking on the list you have provided, I figure out that it contains only the metricIDs without metricDependenciesIDs.

davidediruscio commented 4 years ago

The metrics providers above still incomplete.

Yes I am aware, and noticed what you mentioned. I did wrote a system to resolve the dependencies in my provisionning script. This being said I'm not sure it's related to the current issue. I'll do another test.

davidediruscio commented 4 years ago

The easy way to reproduce is to issue the following, from the box hosting the docker stack when the task runs: $ sudo iptables -I FORWARD 1 -d gitlab.ow2.org -j DROP

And my other comment above is still valid:

in addition @ambpro : the task is showing "completed" with 100% which is not true. It should be catched at some point.

If needed I can open a new issue for the later.

davidediruscio commented 4 years ago

Thanks @Danny2097. the timeout delay is one thing. Now, as mentioned in my initial comment, what's done after it has been triggered ? Is there a retry ?

Hi @mhow2, sorry for the delay I have just returned from annual leave. The answer to your question quoted above is no. None of our readers have retry mechanisms built into them as this feature was not requested during discussions in L'Aquila.

davidediruscio commented 4 years ago

No requested ok, but does it sound logical to you to have a retry ? What doesn't sound good to me is when it fails, it is perfectly silent. The end user should know, at least from the UI (@ambpro) that a task got interrupted and ended prematurely, along with the reason to it.

davidediruscio commented 4 years ago

@ambpro , @aabherve , @Danny2097 : I suggest we implement a warning somewhere in the UI when a task got interrupted unexpectedly and say why and when. So we can distinguish tasks that have finished with success and tasks that have aborted because of a system error.

davidediruscio commented 4 years ago

@mhow2,

in addition @ambpro : the task is showing "completed" with 100% which is not true. It should be catched at some point.

I pushed a commit (3a10b83) containing a patch which should move automatically a failed task to ERROR status, it also cleans up the worker and log the error on the stack traces view. Consequently, the global status of a project will be in addition to No Data - In Progress - Up To Date the status Error. Does it makes sense now?

davidediruscio commented 4 years ago

@mhow2,

I think it has no relation with the heart of this issue but I just realized that the metric provider array above doesn't include the related needed dependencies...

ICYMI, I implemented recently within the services /analysis/task/create and /analysis/task/update (see https://github.com/crossminer/scava/issues/345#issuecomment-528851252) a way to validate dependencies on the server side before creating or updating a task.

eclipse-researchlabs / scava

java.net.SocketTimeoutException and task interrupted #41