Open ap0n opened 7 years ago
I generated the TimeTravelPip pipeline by selecting only the CorrelationSW
algorithm as a member of the fCorrelationFinancial
family. When I try to run it using cli.sh
, I get the following error from the infrastructure.
19:36:22.324 [pool-2-thread-4] INFO eu.qualimaster.events.EventManager - dispatching CoordinationCommandExecutionEvent command: PipelineCommand status: START options: PipelineOptions: {free.eventBus.host=clu01.softnet.tuc.gr, free.monitoring.volume.enabled=true, qm.adaptation=null, free.eventBus.disableLogging=eu.qualimaster.monitoring.events.PipelineElementMultiObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineElementObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineObservationMonitoringEvent,eu.qualimaster.monitoring.events.PlatformMultiObservationHostMonitoringEvent, free.eventBus.port=14000, free.confModel.initMode=ADAPTIVE, free.pipelines.ports=22000-22500} pipeline: TimeTravelPip senderId: 2d86352310db2bd3:-f17b08a:159038e6684:-8000-32065805755823057 messageId: 14b837be-cf6e-4690-b052-f3d57f4e3e5d cause: PipelineCommand status: START options: PipelineOptions: {free.eventBus.host=clu01.softnet.tuc.gr, free.monitoring.volume.enabled=true, qm.adaptation=null, free.eventBus.disableLogging=eu.qualimaster.monitoring.events.PipelineElementMultiObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineElementObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineObservationMonitoringEvent,eu.qualimaster.monitoring.events.PlatformMultiObservationHostMonitoringEvent, free.eventBus.port=14000, free.confModel.initMode=ADAPTIVE, free.pipelines.ports=22000-22500} pipeline: TimeTravelPip senderId: 2d86352310db2bd3:-f17b08a:159038e6684:-8000-32065805755823057 messageId: 14b837be-cf6e-4690-b052-f3d57f4e3e5d messageId: 14b837be-cf6e-4690-b052-f3d57f4e3e5d receiverId: 2d86352310db2bd3:-f17b08a:159038e6684:-8000-32065805755823057 code: 3 message: while starting pipeline 'TimeTravelPip': Topology class name is empty in mapping TimeTravelPip null {} {} [null] [TimeTravelPip] algs: {} {} {} comp: {} {} params: {} {} subPipelines []. Cannot start pipeline TimeTravelPip. If you try to start it manually, please ensure that the pipeline name in the configuration is also the name of the Jar an in the package name of the topology.
However, I cannot see what is wrong with the mapping.xml
... I'm attaching the generated code here, @eichelbe @cuiqin can you have a look (I build the code on the cluster using the mvn clean install command)?
TimeTravelPip.zip
Moreover, when I generated the same pipeline by selecting the TopoSoftwareCorrelationFinancial
algorithm, I was able to start the pipeline normally. However, its visualization looked like this
and no tuples were passing from PipelineVar_10_FamilyElement2
to PipelineVar_10_FamilyElement3
!
Any ideas?
From the log, I see the pipeline starts with the ADAPTIVE mode. The generation from the tool is actually not yet updated to consider this mode as it is in stabilizing phase. Could you please try to run the TimeTravelPip with CorrelationSW in the STATIC model by setting: confModel.initMode = STATIC to see whether you can start the pipeline? We will let you know when we update the instantiation for the tool to use the ADAPTIVE startup.
Regarding the issue that no tuple passing from PipelineVar_10_FamilyElement2 to PipelineVar_10_FamilyElement3, what are the underlying algorithms? Are they simple java algorithm or distributed algorithm? Do you receive any tuple in PipelineVar_10_FamilyElement2? Maybe you can put some logs in its underlying algorithm to see whether it gets input data. Only knowing the family element, it is hard to say..
Where would I set the confModel.initMode = STATIC
?
I will debug the pipeline and let you know. I just wondered if you had any idea on why the pipeline appears to consist of two completely separated branches... Anyway, I let you know when I add the logs and see what is really going on.
The STATIC setting in the infrastructure configuration qm.infrastructure.cfg.
Another thing in my mind is that, with the local modification on the configuration via QM-IConf, the model you used to generate the pipeline is then not consistent with the most recent one downloaded by the infrastructure. This can confuse the infrastructure to collect the right lifecycles while starting the pipeline. For this, we would also need to update your configuration changes to QM2.devel on Jenkins to fresh the model artifact which will be downloaded by the infrastructure into the cluster. Let me know if this is the case. Then I can adjust the respective configuration based on your modification.
Two separated branches? Does the topology declaration in the Topology match to the pipeline configuration? Let me know if you find something.
It seems that there is a problem at the topology generated code. PipelineVar_10_FamilyElement3
(DynamicGraphCompilation) is not connected to PipelineVar_10_FamilyElement2
(TimeGraphMapper) at Topology.java. (Shouldn't that produce an InvalidTopologyException
...? :confused:)
I figured out that the tupleType configuration of the flow(f5) between PipelineVar_10_FamilyElement2 and PipelineVar_10_FamilyElement3 is missing. That's why the link between these two nodes is broken in the generation. I also saw the flow(f11) has the same problem. @ap0n could you please check which type shall be selected for this flow? We missed the constraint on the tupleType. I added that. Now it should complain if no tupleType is configure.
Based on the input configuration of the TimeTravelSink, I selected the "pathStream" for the f11 ;)
That's the correct choice :smile:
I tried to test the pipeline again (re-generated it) and now I get the mapping.xml
error even if I use TopoSoftwareCorrelationFinancial
algorithm. The confModel.initMode
option is already set to STATIC
...
Do you get the same error as you reported before? Could you please attach the mapping file here? I fear that you used the most recent infrastructure to start the pipeline, but the generation from the tool is inconsistent to that version of the infrastructure. It would be good to know wether the TimeTravelPip from the repository can be started.
Yes, same error. Here is the mapping.xml mapping.xml.txt
I can't currently reach uni-hildesheim.de
domain, so I will try the pipeline at the repository as soon as the repository is back online.
However, I noticed that at the generated pom.xml there was a duplicate dependency. If I remember correctly it was the following.
<dependency>
<groupId>eu.qualimaster</groupId>
<artifactId>time-graph-external</artifactId>
<version>0.1-SNAPSHOT</version>
</dependency>
After removing the duplication and restarted the infrastructure I did manage to run the pipeline -- not without problems, but still...
I tested the repository version and it was started normally. Do you know when which version of the infrastructure is compatible with the tool generation (or when the tool will be compatible with the latest version of the infrastructure)? Debugging using the repository version of the pipeline is not convenient at all (if not impossible)...
Hi, I've seen your messages but I was still involved in completing a document you shall receive by mail now ;) Ok, I understand. Unfortunately Cui is not available for syncing the models. There are several options
1) You try the nightly version compiled by Jenkins. 2) You install the full thingy including EASy-Producer into an empty Eclipse. 3) I run the Eclipse package over the most recent version and you try that one against QM2. 4) We try it with a release version based on the actual code and I anyway sync the models.
Currently I prefer option 3, or any other that helps you out of this situation....
BTW, the repeated dependency is not in the Jenkins version?
Ok, after some installation stuff, my Eclipse exported the most recent version for windows/linux_gtk_x86_64. Windows seems to be ok. Both try to access QM2.devel instead of the released conf model. Uploading. Will take some time but I'll write you a mail...
The duplicate dependency is in the Jenkins version as well (see here).
I just tested the latest version of the tool you sent me and the generated pipeline started just fine (infrastructure-wise). I can resume the debugging now...
Fine. It makes sense to leave the resolution of the dependency problem to Cui if not a fix is urgently needed...
Sure.
There is another problem with the generation. At this file, line 99, taskIdTimeGraphIndexer
is never initialized causing a null pointer exception. There should be a line like
taskIdTimeGraphIndexer = topologyContext.getComponentTasks("PipelineVar_10_FamilyElement4");
just before it...
The generation knows about the assignment, but it seems that it is just executed in case of a sub-pipeline. I tried to tie together the related data to obtain the storm component name on the right side and running a generation on my side. If it produces the required code, I will let you know. Seems that this shall not affect other code, but it will be worth checking that...
Could you please check if these lines would be ok although not in your code right now:
I think they are fine.
Ok, committed. Jenkins is building. Local update with your recent QM-IConf shall have the changes.
First of all, happy new year 2017!
Yesterday I tried to run the pipeline once more. The pipeline started but after a while errors began to show up both in the infrastructure and in the workers. I'm attaching the logs here. Infrastructure log (main.log) reads
12:57:23.293 [Timer-0] ERROR e.q.monitoring.ReasoningTask - During value binding: Index: 0, Size: 0 calling storeValueBinding(Configuration,mapOf) with [net.ssehub.easy.instantiation.core.model.vilTypes.configuration.Configuration@14071384, {Algorithm:TimeTravelPip:Preprocessor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, Machine:clu19.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:TimeGraphIndexer:ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:ITEMS=0.0, Pipeline:TimeTravelPip:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu16.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:queries:IS_ENACTING=0.0, Algorithm:TimeTravelPip:Preprocessor:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:HOSTS=5.0, Machine:clu09.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:CAPACITY=0.0, Machine:clu02.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:Preprocessor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:EXECUTORS=5.0, Pipeline:TimeTravelPip:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, Machine:clu17.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:FinancialDataSource:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_VALID=1.0, PipelineElement:TimeTravelPip:__acker:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:HOSTS=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:ITEMS=0.0, Pipeline:TimeTravelPip:IS_VALID=1.0, Infrastructure::AVAILABLE_MACHINES=16.0, PipelineElement:TimeTravelPip:Preprocessor:EXECUTORS=1.0, Algorithm:TimeTravelPip:SpringClient:IS_VALID=1.0, Machine:clu03.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, Pipeline:TimeTravelPip:TASKS=43.0, PipelineElement:TimeTravelPip:Preprocessor:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:ITEMS=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:ITEMS=0.0, PipelineElement:TimeTravelPip:queries:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:__acker:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_VALID=1.0, Algorithm:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:LATENCY=4.9E-324, Algorithm:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:AVAILABLE=1.0, Machine:clu07.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:TASKS=1.0, PipelineElement:TimeTravelPip:queries:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:ITEMS=0.0, Machine:clu06.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:AVAILABLE=1.0, Machine:clu24.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:SpringClient:LATENCY=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:HOSTS=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:HOSTS=12.0, Algorithm:TimeTravelPip:TimeTravelSink:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:ITEMS=0.0, Machine:clu14.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:TimeTravelSink:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_ITEMS=0.0, Infrastructure::USED_MACHINES=15.0, PipelineElement:TimeTravelPip:Preprocessor:HOSTS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, Machine:clu08.softnet.tuc.gr:AVAILABLE=1.0, Pipeline:TimeTravelPip:HOSTS=15.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu18.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:TimeTravelSink:EXECUTORS=1.0, PipelineElement:TimeTravelPip:__acker:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:EXECUTORS=1.0, PipelineElement:TimeTravelPip:Preprocessor:LATENCY=0.0, Machine:clu25.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:EXECUTORS=15.0, Algorithm:TimeTravelPip:SpringClient:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, Machine:clu05.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:TASKS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:TASKS=15.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:TASKS=5.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:CAPACITY=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, Pipeline:TimeTravelPip:EXECUTORS=43.0, Machine:clu26.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Machine:clu04.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0}]
net.ssehub.easy.instantiation.core.model.common.VilException: Index: 0, Size: 0 calling storeValueBinding(Configuration,mapOf) with [net.ssehub.easy.instantiation.core.model.vilTypes.configuration.Configuration@14071384, {Algorithm:TimeTravelPip:Preprocessor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, Machine:clu19.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:TimeGraphIndexer:ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:ITEMS=0.0, Pipeline:TimeTravelPip:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu16.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:queries:IS_ENACTING=0.0, Algorithm:TimeTravelPip:Preprocessor:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:HOSTS=5.0, Machine:clu09.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:CAPACITY=0.0, Machine:clu02.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:Preprocessor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:EXECUTORS=5.0, Pipeline:TimeTravelPip:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, Machine:clu17.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:FinancialDataSource:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_VALID=1.0, PipelineElement:TimeTravelPip:__acker:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:HOSTS=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:ITEMS=0.0, Pipeline:TimeTravelPip:IS_VALID=1.0, Infrastructure::AVAILABLE_MACHINES=16.0, PipelineElement:TimeTravelPip:Preprocessor:EXECUTORS=1.0, Algorithm:TimeTravelPip:SpringClient:IS_VALID=1.0, Machine:clu03.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, Pipeline:TimeTravelPip:TASKS=43.0, PipelineElement:TimeTravelPip:Preprocessor:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:ITEMS=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:ITEMS=0.0, PipelineElement:TimeTravelPip:queries:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:__acker:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_VALID=1.0, Algorithm:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:LATENCY=4.9E-324, Algorithm:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:AVAILABLE=1.0, Machine:clu07.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:TASKS=1.0, PipelineElement:TimeTravelPip:queries:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:ITEMS=0.0, Machine:clu06.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:AVAILABLE=1.0, Machine:clu24.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:SpringClient:LATENCY=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:HOSTS=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:HOSTS=12.0, Algorithm:TimeTravelPip:TimeTravelSink:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:ITEMS=0.0, Machine:clu14.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:TimeTravelSink:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_ITEMS=0.0, Infrastructure::USED_MACHINES=15.0, PipelineElement:TimeTravelPip:Preprocessor:HOSTS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, Machine:clu08.softnet.tuc.gr:AVAILABLE=1.0, Pipeline:TimeTravelPip:HOSTS=15.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu18.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:TimeTravelSink:EXECUTORS=1.0, PipelineElement:TimeTravelPip:__acker:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:EXECUTORS=1.0, PipelineElement:TimeTravelPip:Preprocessor:LATENCY=0.0, Machine:clu25.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:EXECUTORS=15.0, Algorithm:TimeTravelPip:SpringClient:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, Machine:clu05.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:TASKS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:TASKS=15.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:TASKS=5.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:CAPACITY=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, Pipeline:TimeTravelPip:EXECUTORS=43.0, Machine:clu26.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Machine:clu04.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0}]
at net.ssehub.easy.instantiation.core.model.vilTypes.ReflectionOperationDescriptor.invoke(ReflectionOperationDescriptor.java:231) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.expressions.EvaluationVisitor.visitCall(EvaluationVisitor.java:112) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.expressions.EvaluationVisitor.visitCallExpression(EvaluationVisitor.java:58) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitStrategyCallExpressionImpl(BuildlangExecution.java:751) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitStrategyCallExpression(BuildlangExecution.java:691) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.StrategyCallExpression.accept(StrategyCallExpression.java:184) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.common.ExecutionVisitor.visitExpressionStatement(ExecutionVisitor.java:150) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.common.ExpressionStatement.accept(ExpressionStatement.java:43) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.ExpressionStatement.accept(ExpressionStatement.java:25) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeRuleBody(BuildlangExecution.java:885) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.applyRuleBody(BuildlangExecution.java:857) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitRule(BuildlangExecution.java:1004) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.Rule.accept(Rule.java:336) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeModelCall(BuildlangExecution.java:1119) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeModelCall(BuildlangExecution.java:103) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.common.ExecutionVisitor.proceedModelCall(ExecutionVisitor.java:416) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.common.ExecutionVisitor.visitModelCallExpression(ExecutionVisitor.java:364) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitRuleCallExpression(BuildlangExecution.java:1110) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.RuleCallExpression.accept(RuleCallExpression.java:51) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.dynamicCall(RtVilExecution.java:1315) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.dynamicCall(RtVilExecution.java:1330) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.dynamicCall(RtVilExecution.java:1287) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.callBindValues(RtVilExecution.java:1203) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.processProperties(RtVilExecution.java:776) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeScript(BuildlangExecution.java:382) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitScript(BuildlangExecution.java:333) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.visitScript(RtVilExecution.java:765) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.rt.core.model.rtVil.Script.accept(Script.java:148) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.execution.Executor.execute(Executor.java:474) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.execution.Executor.execute(Executor.java:422) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at net.ssehub.easy.instantiation.core.model.execution.Executor.execute(Executor.java:409) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
at eu.qualimaster.monitoring.ReasoningTask.reason(ReasoningTask.java:535) [MonitoringLayer-0.5.0-SNAPSHOT.jar:na]
at eu.qualimaster.monitoring.ReasoningTask.run(ReasoningTask.java:499) [MonitoringLayer-0.5.0-SNAPSHOT.jar:na]
at java.util.TimerThread.mainLoop(Timer.java:555) [na:1.7.0_67]
at java.util.TimerThread.run(Timer.java:505) [na:1.7.0_67]
Worker logs (merged.log) read:
clu18,supervisor,2016-12-21T19:15:03.983+0200 o.a.s.c.ConnectionState [ERROR] Connection timed out for connection string (clu01.softnet.tuc.gr:2181,clu02.softnet.tuc.gr:2181,clu03.softnet.tuc.gr:2181/storm) and timeout (15000) / elapsed (18246)
org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:488) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) [storm-core-0.9.5.jar:0.9.5]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.zookeeper$exists_node_QMARK_$fn__1826.invoke(zookeeper.clj:101) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.cluster$mk_distributed_cluster_state$reify__2073.set_ephemeral_node(cluster.clj:74) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.cluster$mk_storm_cluster_state$reify__2530.supervisor_heartbeat_BANG_(cluster.clj:358) [storm-core-0.9.5.jar:0.9.5]
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) [clojure-1.5.1.jar:na]
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$fn__7444$exec_fn__1103__auto____7445$heartbeat_fn__7447.invoke(supervisor.clj:423) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.timer$schedule_recurring$this__1807.invoke(timer.clj:99) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.timer$mk_timer$fn__1790$fn__1791.invoke(timer.clj:50) [storm-core-0.9.5.jar:0.9.5]
at backtype.storm.timer$mk_timer$fn__1790.invoke(timer.clj:42) [storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
which is about Curator. Could this one be related to #56 ?
Regarding the duplicated dependency, the pom generation actually takes care of the duplicated nodes in a pipeline. However, it assumed that each algorithm has its own artifact specification. As the algorithm TimeGraphIndexer and TimeGraphQueryExecutor are configured with the same artifact, the duplicated dependency appeared. I have added checks also on the configured artifacts in a pipeline. The changes are committed. Now the duplicated one shall be away.
The curator thing is related to the zkeeper connection. I had a look at the worker logs, it seems the supervisors are not started properly. You may try to run the command that attempts to launch the "worker" to see what is really happening.
The index problem is caused by the fast value mapping. We identified an issue (to be committed) and try to pass internal exception traces (to be committed) along with the VILExceptions so that identifying a problem is easier.
The curator problem only appears at clu18 node and checking its supervisor logs revealed a reported JRE fatal error and hard disk problems. I contacted our admin in order to resolve them (if possible) and I'm going to disable this node (storm storm supervisor) and try again.
Build for first commit is done...
Second commit is building...
Now I'm getting errors related to direct grouping
java.lang.RuntimeException: java.lang.IllegalArgumentException: Cannot do regular emit to direct stream
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.executor$fn__6647$fn__6659$fn__6706.invoke(executor.clj:748) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.util$async_loop$fn__459.invoke(util.clj:463) ~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.lang.IllegalArgumentException: Cannot do regular emit to direct stream
at backtype.storm.daemon.task$mk_tasks_fn$fn__6307.invoke(task.clj:155) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.executor$fn__6647$fn__6659$bolt_emit__6686.invoke(executor.clj:663) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.executor$fn__6647$fn$reify__6692.emit(executor.clj:698) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.task.OutputCollector.emit(OutputCollector.java:203) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.task.OutputCollector.emit(OutputCollector.java:63) ~[storm-core-0.9.5.jar:0.9.5]
at eu.qualimaster.TimeTravelPip.topology.PipelineVar_10_FamilyElement3FamilyElement.forwardTuple(PipelineVar_10_FamilyElement3FamilyElement.java:163) ~[stormjar.jar:na]
at eu.qualimaster.TimeTravelPip.topology.PipelineVar_10_FamilyElement3FamilyElement.execute(PipelineVar_10_FamilyElement3FamilyElement.java:172) ~[stormjar.jar:na]
at backtype.storm.daemon.executor$fn__6647$tuple_action_fn__6649.invoke(executor.clj:633) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.executor$mk_task_receiver$fn__6570.invoke(executor.clj:401) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.disruptor$clojure_handler$reify__1605.onEvent(disruptor.clj:58) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:120) ~[storm-core-0.9.5.jar:0.9.5]
... 6 common frames omitted
Direct streams are declared at the generated code (topology.java
) but the emitDirect
is never used... (e.g. PipelineVar_10_FamilyElement3FamilyElement:163
)
You are right. The condition to generate the emitDirect code was missed in the family nodes of the main pipeline. I made a change and Jenkins is building. Please cross-check the generated code afterwards, thanks:)
I tried the newly generated code and it seems fine. But I now see two new problems (!) at the infrastructure.
The first one is a StackOverflowError
(line ~250 of the
attached log) and the second an IndexOutOfBoundsException
(line ~2450)
I suppose the StackOverflowError is probably related to the circles of the pipeline (?)
Probably right for StackOverflow - cycle detection was omitted there :( Added as simple one and commited. IndexOutOfBoundsException is close to the expected location, but the code indicated by the exception was already changed so something does not seem to be up to date...
I updated everything and the pipeline started. However, I have this problem I don't know how to resolve. Any ideas would be much appreciated...
In the TimeTravelPip I have a request-response type of communication between the tasks; each request includes the "client's" task id so that the "server" knows where to reply (using direct grouping). This works fine if there are no failures.
When I try to run the QM pipeline though, the workers are frequently restarting, destroying the aforementioned communication because a respawned task has different task-id than the original.
Of course we must solve the problem of restarting (still reviewing the logs) but if there is any idea on how to fix this problem it would help a lot.
Indeed, terminating workers is a well-known but not really good documented problem. Could you please detail a bit the steps that you did so far, the alternatives you already excluded. Just as a guess, switching the Infrastructure to events for signals rather than Curator/Zookeeper could be worth a trial, even if it is just for excluding problems.
For the time being I'm reviewing logs and trying to understand what the problem is; all I see however is netty problems (indicating dying workers I guess). I can't think of any alternatives since I don't know what the actual problem is... I also tried to run the pipeline with a small dataset in order to rule out load-related problems, but the behavior is the same. I guess switching would be just for debugging, right?
What troubles me the most is that the time travel pipeline won't work with restarting workers. Its functionality heavily relies on the state that the workers keep.
I see, also the Monitoring Layer does not really like restarting workers... ;)
As far as I know the Storm code, a worker is either dying due to an exception or due to a timeout. However, Storm does not tell us which of its timers is failing (adaptive storm does in the mean time but the information is limited.)
One way I could image is to run your pipeline on the LUH cluster. Our plan is to install adaptive storm there. As the LUH cluster does not have NFS, doing these changes is a bit tricky, also to use simulated data. Cui has now a data source based on yours that can help in this case. But we still need to learn from Miroslav where and how to copy without interfering with other tasks there. Then one trial could be to run it there and to see if/which timer is failing.
Right, switching to the events would be only for debugging and it would only help if the Zookeepers are the problem.
And if it is a netty connection problem, at least on our side it was due to another worker failing which shall host the netty server...
Our simulator reads the simulated data from the grnet hdfs, NFS should be a problem... So, I'll move on to the next task until LUH cluster is ready (do we have any estimation on this?)...
Next days... some modifications in adaptive storm deferred that. Hopefully with the changed source even a missing NFS should not be a problem but needs manual copying...
Just to let you know that the LUH cluster as now up and running. LUH will next do a test on the "other" pipelines...
Thanks for the update! I'll check it out later today... (I'm also watching issue #62)
Does the financial data source (spring-simulator) require HDFS (I think I have this in mind)? If yes, can we extend it to alternatively read from the file system [Cui did first steps there but I think it makes sense to have it in the same code to avoid confusion while testing/experiments/demo]?
Yes it does. But I think Nick has also supported local file system through option flags. I don't know if the local fs option is tested but it certainly is hardcoded to HDFS right now...
If the Okeanos HDFS is not an option, I can test the local fs code.
Great, this would help a lot!
Local FS could be just a fallback, the location could be taken from the DMS configuration class (dfs.path) or given via a VM switch through the worker configuration. However it fits your work without destroying too much :)
It seems that there is HDFS accessible. Thanks for helping us with local fs. Then we can try it as it is. Christoph will start testing.
In any case, spring client simulator will try to connect Okeanos HDFS (see the code here -- line 198) and try to get data from /user/storm/
path... These things are hardcoded. If necessary we can export them as parameters...
There are some basic settings in DML configuration that could help to parameterize that class, e.g., the base HDFS URL (for the pipeline settings). We/you can also add some further ones ;)
I think we need two more settings.
simulation.useHdfs
boolean. Wether to use HDFS for data simulation or notsimulation.localPath
string. The path to get the data from when using local FS. (dfs.path
now points to /var/nfs/profiling
. I don't know if it's a good idea to also add data in that directory...)For the hdfs path I think it's safe to use hdfs.path setting. What do you think?
Ok for me if it fits your needs. I've added the two settings to the DML cfg and they shall be transferred from the infrastructure to the workers/algorithms.
@L3SQualimaster this may also help avoiding code replacements for the Twitter source