QualiMaster / qm-issues

2 stars 0 forks source link

Time-travel pipeline #59

Open ap0n opened 7 years ago

ap0n commented 7 years ago
ap0n commented 7 years ago

I generated the TimeTravelPip pipeline by selecting only the CorrelationSW algorithm as a member of the fCorrelationFinancial family. When I try to run it using cli.sh, I get the following error from the infrastructure.

19:36:22.324 [pool-2-thread-4] INFO  eu.qualimaster.events.EventManager - dispatching CoordinationCommandExecutionEvent command: PipelineCommand status: START options: PipelineOptions: {free.eventBus.host=clu01.softnet.tuc.gr, free.monitoring.volume.enabled=true, qm.adaptation=null, free.eventBus.disableLogging=eu.qualimaster.monitoring.events.PipelineElementMultiObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineElementObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineObservationMonitoringEvent,eu.qualimaster.monitoring.events.PlatformMultiObservationHostMonitoringEvent, free.eventBus.port=14000, free.confModel.initMode=ADAPTIVE, free.pipelines.ports=22000-22500} pipeline: TimeTravelPip senderId: 2d86352310db2bd3:-f17b08a:159038e6684:-8000-32065805755823057 messageId: 14b837be-cf6e-4690-b052-f3d57f4e3e5d  cause: PipelineCommand status: START options: PipelineOptions: {free.eventBus.host=clu01.softnet.tuc.gr, free.monitoring.volume.enabled=true, qm.adaptation=null, free.eventBus.disableLogging=eu.qualimaster.monitoring.events.PipelineElementMultiObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineElementObservationMonitoringEvent,eu.qualimaster.monitoring.events.PipelineObservationMonitoringEvent,eu.qualimaster.monitoring.events.PlatformMultiObservationHostMonitoringEvent, free.eventBus.port=14000, free.confModel.initMode=ADAPTIVE, free.pipelines.ports=22000-22500} pipeline: TimeTravelPip senderId: 2d86352310db2bd3:-f17b08a:159038e6684:-8000-32065805755823057 messageId: 14b837be-cf6e-4690-b052-f3d57f4e3e5d  messageId: 14b837be-cf6e-4690-b052-f3d57f4e3e5d receiverId: 2d86352310db2bd3:-f17b08a:159038e6684:-8000-32065805755823057 code: 3 message: while starting pipeline 'TimeTravelPip': Topology class name is empty in mapping TimeTravelPip null {} {} [null] [TimeTravelPip] algs: {} {} {} comp: {} {} params: {} {} subPipelines []. Cannot start pipeline TimeTravelPip. If you try to start it manually, please ensure that the pipeline name in the configuration is also the name of the Jar an in the package name of the topology.

However, I cannot see what is wrong with the mapping.xml... I'm attaching the generated code here, @eichelbe @cuiqin can you have a look (I build the code on the cluster using the mvn clean install command)? TimeTravelPip.zip

Moreover, when I generated the same pipeline by selecting the TopoSoftwareCorrelationFinancial algorithm, I was able to start the pipeline normally. However, its visualization looked like this timetravelpip and no tuples were passing from PipelineVar_10_FamilyElement2 to PipelineVar_10_FamilyElement3!

Any ideas?

cuiqin commented 7 years ago

From the log, I see the pipeline starts with the ADAPTIVE mode. The generation from the tool is actually not yet updated to consider this mode as it is in stabilizing phase. Could you please try to run the TimeTravelPip with CorrelationSW in the STATIC model by setting: confModel.initMode = STATIC to see whether you can start the pipeline? We will let you know when we update the instantiation for the tool to use the ADAPTIVE startup.

Regarding the issue that no tuple passing from PipelineVar_10_FamilyElement2 to PipelineVar_10_FamilyElement3, what are the underlying algorithms? Are they simple java algorithm or distributed algorithm? Do you receive any tuple in PipelineVar_10_FamilyElement2? Maybe you can put some logs in its underlying algorithm to see whether it gets input data. Only knowing the family element, it is hard to say..

ap0n commented 7 years ago

Where would I set the confModel.initMode = STATIC?

I will debug the pipeline and let you know. I just wondered if you had any idea on why the pipeline appears to consist of two completely separated branches... Anyway, I let you know when I add the logs and see what is really going on.

cuiqin commented 7 years ago

The STATIC setting in the infrastructure configuration qm.infrastructure.cfg.

Another thing in my mind is that, with the local modification on the configuration via QM-IConf, the model you used to generate the pipeline is then not consistent with the most recent one downloaded by the infrastructure. This can confuse the infrastructure to collect the right lifecycles while starting the pipeline. For this, we would also need to update your configuration changes to QM2.devel on Jenkins to fresh the model artifact which will be downloaded by the infrastructure into the cluster. Let me know if this is the case. Then I can adjust the respective configuration based on your modification.

Two separated branches? Does the topology declaration in the Topology match to the pipeline configuration? Let me know if you find something.

ap0n commented 7 years ago

It seems that there is a problem at the topology generated code. PipelineVar_10_FamilyElement3 (DynamicGraphCompilation) is not connected to PipelineVar_10_FamilyElement2 (TimeGraphMapper) at Topology.java. (Shouldn't that produce an InvalidTopologyException...? :confused:)

cuiqin commented 7 years ago

I figured out that the tupleType configuration of the flow(f5) between PipelineVar_10_FamilyElement2 and PipelineVar_10_FamilyElement3 is missing. That's why the link between these two nodes is broken in the generation. I also saw the flow(f11) has the same problem. @ap0n could you please check which type shall be selected for this flow? We missed the constraint on the tupleType. I added that. Now it should complain if no tupleType is configure.

cuiqin commented 7 years ago

Based on the input configuration of the TimeTravelSink, I selected the "pathStream" for the f11 ;)

ap0n commented 7 years ago

That's the correct choice :smile:

I tried to test the pipeline again (re-generated it) and now I get the mapping.xml error even if I use TopoSoftwareCorrelationFinancial algorithm. The confModel.initMode option is already set to STATIC...

cuiqin commented 7 years ago

Do you get the same error as you reported before? Could you please attach the mapping file here? I fear that you used the most recent infrastructure to start the pipeline, but the generation from the tool is inconsistent to that version of the infrastructure. It would be good to know wether the TimeTravelPip from the repository can be started.

ap0n commented 7 years ago

Yes, same error. Here is the mapping.xml mapping.xml.txt I can't currently reach uni-hildesheim.de domain, so I will try the pipeline at the repository as soon as the repository is back online.

However, I noticed that at the generated pom.xml there was a duplicate dependency. If I remember correctly it was the following.

<dependency>
    <groupId>eu.qualimaster</groupId>
    <artifactId>time-graph-external</artifactId>
    <version>0.1-SNAPSHOT</version>
</dependency>

After removing the duplication and restarted the infrastructure I did manage to run the pipeline -- not without problems, but still...

ap0n commented 7 years ago

I tested the repository version and it was started normally. Do you know when which version of the infrastructure is compatible with the tool generation (or when the tool will be compatible with the latest version of the infrastructure)? Debugging using the repository version of the pipeline is not convenient at all (if not impossible)...

eichelbe commented 7 years ago

Hi, I've seen your messages but I was still involved in completing a document you shall receive by mail now ;) Ok, I understand. Unfortunately Cui is not available for syncing the models. There are several options

1) You try the nightly version compiled by Jenkins. 2) You install the full thingy including EASy-Producer into an empty Eclipse. 3) I run the Eclipse package over the most recent version and you try that one against QM2. 4) We try it with a release version based on the actual code and I anyway sync the models.

Currently I prefer option 3, or any other that helps you out of this situation....

eichelbe commented 7 years ago

BTW, the repeated dependency is not in the Jenkins version?

eichelbe commented 7 years ago

Ok, after some installation stuff, my Eclipse exported the most recent version for windows/linux_gtk_x86_64. Windows seems to be ok. Both try to access QM2.devel instead of the released conf model. Uploading. Will take some time but I'll write you a mail...

ap0n commented 7 years ago

The duplicate dependency is in the Jenkins version as well (see here).

I just tested the latest version of the tool you sent me and the generated pipeline started just fine (infrastructure-wise). I can resume the debugging now...

eichelbe commented 7 years ago

Fine. It makes sense to leave the resolution of the dependency problem to Cui if not a fix is urgently needed...

ap0n commented 7 years ago

Sure.

ap0n commented 7 years ago

There is another problem with the generation. At this file, line 99, taskIdTimeGraphIndexer is never initialized causing a null pointer exception. There should be a line like

taskIdTimeGraphIndexer = topologyContext.getComponentTasks("PipelineVar_10_FamilyElement4");

just before it...

eichelbe commented 7 years ago

The generation knows about the assignment, but it seems that it is just executed in case of a sub-pipeline. I tried to tie together the related data to obtain the storm component name on the right side and running a generation on my side. If it produces the required code, I will let you know. Seems that this shall not affect other code, but it will be worth checking that...

eichelbe commented 7 years ago

Could you please check if these lines would be ok although not in your code right now:

ap0n commented 7 years ago

I think they are fine.

eichelbe commented 7 years ago

Ok, committed. Jenkins is building. Local update with your recent QM-IConf shall have the changes.

ap0n commented 7 years ago

First of all, happy new year 2017!

Yesterday I tried to run the pipeline once more. The pipeline started but after a while errors began to show up both in the infrastructure and in the workers. I'm attaching the logs here. Infrastructure log (main.log) reads

12:57:23.293 [Timer-0] ERROR e.q.monitoring.ReasoningTask - During value binding: Index: 0, Size: 0 calling storeValueBinding(Configuration,mapOf) with [net.ssehub.easy.instantiation.core.model.vilTypes.configuration.Configuration@14071384, {Algorithm:TimeTravelPip:Preprocessor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, Machine:clu19.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:TimeGraphIndexer:ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:ITEMS=0.0, Pipeline:TimeTravelPip:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu16.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:queries:IS_ENACTING=0.0, Algorithm:TimeTravelPip:Preprocessor:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:HOSTS=5.0, Machine:clu09.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:CAPACITY=0.0, Machine:clu02.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:Preprocessor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:EXECUTORS=5.0, Pipeline:TimeTravelPip:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, Machine:clu17.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:FinancialDataSource:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_VALID=1.0, PipelineElement:TimeTravelPip:__acker:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:HOSTS=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:ITEMS=0.0, Pipeline:TimeTravelPip:IS_VALID=1.0, Infrastructure::AVAILABLE_MACHINES=16.0, PipelineElement:TimeTravelPip:Preprocessor:EXECUTORS=1.0, Algorithm:TimeTravelPip:SpringClient:IS_VALID=1.0, Machine:clu03.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, Pipeline:TimeTravelPip:TASKS=43.0, PipelineElement:TimeTravelPip:Preprocessor:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:ITEMS=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:ITEMS=0.0, PipelineElement:TimeTravelPip:queries:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:__acker:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_VALID=1.0, Algorithm:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:LATENCY=4.9E-324, Algorithm:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:AVAILABLE=1.0, Machine:clu07.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:TASKS=1.0, PipelineElement:TimeTravelPip:queries:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:ITEMS=0.0, Machine:clu06.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:AVAILABLE=1.0, Machine:clu24.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:SpringClient:LATENCY=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:HOSTS=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:HOSTS=12.0, Algorithm:TimeTravelPip:TimeTravelSink:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:ITEMS=0.0, Machine:clu14.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:TimeTravelSink:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_ITEMS=0.0, Infrastructure::USED_MACHINES=15.0, PipelineElement:TimeTravelPip:Preprocessor:HOSTS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, Machine:clu08.softnet.tuc.gr:AVAILABLE=1.0, Pipeline:TimeTravelPip:HOSTS=15.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu18.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:TimeTravelSink:EXECUTORS=1.0, PipelineElement:TimeTravelPip:__acker:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:EXECUTORS=1.0, PipelineElement:TimeTravelPip:Preprocessor:LATENCY=0.0, Machine:clu25.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:EXECUTORS=15.0, Algorithm:TimeTravelPip:SpringClient:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, Machine:clu05.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:TASKS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:TASKS=15.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:TASKS=5.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:CAPACITY=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, Pipeline:TimeTravelPip:EXECUTORS=43.0, Machine:clu26.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Machine:clu04.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0}]
net.ssehub.easy.instantiation.core.model.common.VilException: Index: 0, Size: 0 calling storeValueBinding(Configuration,mapOf) with [net.ssehub.easy.instantiation.core.model.vilTypes.configuration.Configuration@14071384, {Algorithm:TimeTravelPip:Preprocessor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, Machine:clu19.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:TimeGraphIndexer:ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:ITEMS=0.0, Pipeline:TimeTravelPip:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu16.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:queries:IS_ENACTING=0.0, Algorithm:TimeTravelPip:Preprocessor:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:HOSTS=5.0, Machine:clu09.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:CAPACITY=0.0, Machine:clu02.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:Preprocessor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:EXECUTORS=5.0, Pipeline:TimeTravelPip:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, Machine:clu17.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:FinancialDataSource:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_VALID=1.0, PipelineElement:TimeTravelPip:__acker:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:HOSTS=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:HOSTS=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:ITEMS=0.0, Pipeline:TimeTravelPip:IS_VALID=1.0, Infrastructure::AVAILABLE_MACHINES=16.0, PipelineElement:TimeTravelPip:Preprocessor:EXECUTORS=1.0, Algorithm:TimeTravelPip:SpringClient:IS_VALID=1.0, Machine:clu03.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeTravelSink:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:IS_ENACTING=0.0, Pipeline:TimeTravelPip:TASKS=43.0, PipelineElement:TimeTravelPip:Preprocessor:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:ITEMS=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:ITEMS=0.0, PipelineElement:TimeTravelPip:queries:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, Pipeline:TimeTravelPip:CAPACITY=4.9E-324, PipelineElement:TimeTravelPip:__acker:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_VALID=1.0, Algorithm:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:LATENCY=4.9E-324, Algorithm:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:LATENCY=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:AVAILABLE=1.0, Machine:clu07.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:TASKS=1.0, PipelineElement:TimeTravelPip:queries:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:ITEMS=0.0, Machine:clu06.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:THROUGHPUT_VOLUME=0.0, Pipeline:TimeTravelPip:ITEMS=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_VALID=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:CAPACITY=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeGraphIndexer:AVAILABLE=1.0, Machine:clu24.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:SpringClient:LATENCY=0.0, PipelineElement:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:HOSTS=1.0, PipelineElement:TimeTravelPip:FinancialCorrelation:HOSTS=12.0, Algorithm:TimeTravelPip:TimeTravelSink:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:ITEMS=0.0, Machine:clu14.softnet.tuc.gr:AVAILABLE=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphIndexer:LATENCY=0.0, Algorithm:TimeTravelPip:TimeTravelSink:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_ITEMS=0.0, Infrastructure::USED_MACHINES=15.0, PipelineElement:TimeTravelPip:Preprocessor:HOSTS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:TASKS=1.0, PipelineElement:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, Machine:clu08.softnet.tuc.gr:AVAILABLE=1.0, Pipeline:TimeTravelPip:HOSTS=15.0, PipelineElement:TimeTravelPip:FinancialCorrelation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeTravelSink:LATENCY=0.0, Algorithm:TimeTravelPip:TimeGraphQueryExecutor:IS_VALID=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:TimeTravelSink:IS_ENACTING=0.0, Machine:clu18.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:IS_ENACTING=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:LATENCY=4.9E-324, PipelineElement:TimeTravelPip:TimeTravelSink:EXECUTORS=1.0, PipelineElement:TimeTravelPip:__acker:LATENCY=0.0, PipelineElement:TimeTravelPip:TimeGraphMapper:EXECUTORS=1.0, PipelineElement:TimeTravelPip:Preprocessor:LATENCY=0.0, Machine:clu25.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TimeGraphMapper:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:EXECUTORS=15.0, Algorithm:TimeTravelPip:SpringClient:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:THROUGHPUT_VOLUME=0.0, PipelineElement:TimeTravelPip:Preprocessor:IS_ENACTING=0.0, Machine:clu05.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:TASKS=1.0, Algorithm:TimeTravelPip:TimeGraphMapper:AVAILABLE=1.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:LATENCY=0.0, PipelineElement:TimeTravelPip:queries:ITEMS=0.0, Algorithm:TimeTravelPip:SpringClient:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:FinancialCorrelation:TASKS=15.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:__acker:CAPACITY=0.0, PipelineElement:TimeTravelPip:TimeGraphIndexer:TASKS=5.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:CAPACITY=0.0, PipelineElement:TimeTravelPip:FinancialDataSource:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:DynamicGraphCompilation:IS_VALID=1.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:LATENCY=0.0, Pipeline:TimeTravelPip:EXECUTORS=43.0, Machine:clu26.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:__acker:ITEMS=0.0, PipelineElement:TimeTravelPip:DynamicGraphCompilation:EXECUTORS=1.0, Algorithm:TimeTravelPip:TimeTravelSink:THROUGHPUT_ITEMS=0.0, Algorithm:TimeTravelPip:Preprocessor:THROUGHPUT_VOLUME=0.0, Machine:clu04.softnet.tuc.gr:AVAILABLE=1.0, PipelineElement:TimeTravelPip:queries:THROUGHPUT_VOLUME=0.0, Algorithm:TimeTravelPip:TopoSoftwareCorrelationFinancial:IS_ENACTING=0.0, PipelineElement:TimeTravelPip:TimeGraphQueryExecutor:THROUGHPUT_VOLUME=0.0}]
    at net.ssehub.easy.instantiation.core.model.vilTypes.ReflectionOperationDescriptor.invoke(ReflectionOperationDescriptor.java:231) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.expressions.EvaluationVisitor.visitCall(EvaluationVisitor.java:112) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.expressions.EvaluationVisitor.visitCallExpression(EvaluationVisitor.java:58) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitStrategyCallExpressionImpl(BuildlangExecution.java:751) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitStrategyCallExpression(BuildlangExecution.java:691) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.StrategyCallExpression.accept(StrategyCallExpression.java:184) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.common.ExecutionVisitor.visitExpressionStatement(ExecutionVisitor.java:150) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.common.ExpressionStatement.accept(ExpressionStatement.java:43) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.ExpressionStatement.accept(ExpressionStatement.java:25) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeRuleBody(BuildlangExecution.java:885) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.applyRuleBody(BuildlangExecution.java:857) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitRule(BuildlangExecution.java:1004) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.Rule.accept(Rule.java:336) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeModelCall(BuildlangExecution.java:1119) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeModelCall(BuildlangExecution.java:103) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.common.ExecutionVisitor.proceedModelCall(ExecutionVisitor.java:416) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.common.ExecutionVisitor.visitModelCallExpression(ExecutionVisitor.java:364) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitRuleCallExpression(BuildlangExecution.java:1110) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.RuleCallExpression.accept(RuleCallExpression.java:51) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.dynamicCall(RtVilExecution.java:1315) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.dynamicCall(RtVilExecution.java:1330) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.dynamicCall(RtVilExecution.java:1287) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.callBindValues(RtVilExecution.java:1203) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.processProperties(RtVilExecution.java:776) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.executeScript(BuildlangExecution.java:382) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.buildlangModel.BuildlangExecution.visitScript(BuildlangExecution.java:333) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.RtVilExecution.visitScript(RtVilExecution.java:765) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.rt.core.model.rtVil.Script.accept(Script.java:148) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.execution.Executor.execute(Executor.java:474) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.execution.Executor.execute(Executor.java:422) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at net.ssehub.easy.instantiation.core.model.execution.Executor.execute(Executor.java:409) ~[EASy.QualiMaster-1.2.0-SNAPSHOT.jar:na]
    at eu.qualimaster.monitoring.ReasoningTask.reason(ReasoningTask.java:535) [MonitoringLayer-0.5.0-SNAPSHOT.jar:na]
    at eu.qualimaster.monitoring.ReasoningTask.run(ReasoningTask.java:499) [MonitoringLayer-0.5.0-SNAPSHOT.jar:na]
    at java.util.TimerThread.mainLoop(Timer.java:555) [na:1.7.0_67]
    at java.util.TimerThread.run(Timer.java:505) [na:1.7.0_67]

Worker logs (merged.log) read:

clu18,supervisor,2016-12-21T19:15:03.983+0200 o.a.s.c.ConnectionState [ERROR] Connection timed out for connection string (clu01.softnet.tuc.gr:2181,clu02.softnet.tuc.gr:2181,clu03.softnet.tuc.gr:2181/storm) and timeout (15000) / elapsed (18246)
 org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:488) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) [storm-core-0.9.5.jar:0.9.5]
    at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.zookeeper$exists_node_QMARK_$fn__1826.invoke(zookeeper.clj:101) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.cluster$mk_distributed_cluster_state$reify__2073.set_ephemeral_node(cluster.clj:74) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.cluster$mk_storm_cluster_state$reify__2530.supervisor_heartbeat_BANG_(cluster.clj:358) [storm-core-0.9.5.jar:0.9.5]
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
    at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) [clojure-1.5.1.jar:na]
    at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.supervisor$fn__7444$exec_fn__1103__auto____7445$heartbeat_fn__7447.invoke(supervisor.clj:423) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.timer$schedule_recurring$this__1807.invoke(timer.clj:99) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.timer$mk_timer$fn__1790$fn__1791.invoke(timer.clj:50) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.timer$mk_timer$fn__1790.invoke(timer.clj:42) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

which is about Curator. Could this one be related to #56 ?

cuiqin commented 7 years ago

Regarding the duplicated dependency, the pom generation actually takes care of the duplicated nodes in a pipeline. However, it assumed that each algorithm has its own artifact specification. As the algorithm TimeGraphIndexer and TimeGraphQueryExecutor are configured with the same artifact, the duplicated dependency appeared. I have added checks also on the configured artifacts in a pipeline. The changes are committed. Now the duplicated one shall be away.

cuiqin commented 7 years ago

The curator thing is related to the zkeeper connection. I had a look at the worker logs, it seems the supervisors are not started properly. You may try to run the command that attempts to launch the "worker" to see what is really happening.

eichelbe commented 7 years ago

The index problem is caused by the fast value mapping. We identified an issue (to be committed) and try to pass internal exception traces (to be committed) along with the VILExceptions so that identifying a problem is easier.

ap0n commented 7 years ago

The curator problem only appears at clu18 node and checking its supervisor logs revealed a reported JRE fatal error and hard disk problems. I contacted our admin in order to resolve them (if possible) and I'm going to disable this node (storm storm supervisor) and try again.

eichelbe commented 7 years ago

Build for first commit is done...

eichelbe commented 7 years ago

Second commit is building...

ap0n commented 7 years ago

Now I'm getting errors related to direct grouping

java.lang.RuntimeException: java.lang.IllegalArgumentException: Cannot do regular emit to direct stream
    at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.executor$fn__6647$fn__6659$fn__6706.invoke(executor.clj:748) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.util$async_loop$fn__459.invoke(util.clj:463) ~[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
Caused by: java.lang.IllegalArgumentException: Cannot do regular emit to direct stream
    at backtype.storm.daemon.task$mk_tasks_fn$fn__6307.invoke(task.clj:155) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.executor$fn__6647$fn__6659$bolt_emit__6686.invoke(executor.clj:663) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.executor$fn__6647$fn$reify__6692.emit(executor.clj:698) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.task.OutputCollector.emit(OutputCollector.java:203) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.task.OutputCollector.emit(OutputCollector.java:63) ~[storm-core-0.9.5.jar:0.9.5]
    at eu.qualimaster.TimeTravelPip.topology.PipelineVar_10_FamilyElement3FamilyElement.forwardTuple(PipelineVar_10_FamilyElement3FamilyElement.java:163) ~[stormjar.jar:na]
    at eu.qualimaster.TimeTravelPip.topology.PipelineVar_10_FamilyElement3FamilyElement.execute(PipelineVar_10_FamilyElement3FamilyElement.java:172) ~[stormjar.jar:na]
    at backtype.storm.daemon.executor$fn__6647$tuple_action_fn__6649.invoke(executor.clj:633) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.executor$mk_task_receiver$fn__6570.invoke(executor.clj:401) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.disruptor$clojure_handler$reify__1605.onEvent(disruptor.clj:58) ~[storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:120) ~[storm-core-0.9.5.jar:0.9.5]
    ... 6 common frames omitted

Direct streams are declared at the generated code (topology.java) but the emitDirect is never used... (e.g. PipelineVar_10_FamilyElement3FamilyElement:163)

cuiqin commented 7 years ago

You are right. The condition to generate the emitDirect code was missed in the family nodes of the main pipeline. I made a change and Jenkins is building. Please cross-check the generated code afterwards, thanks:)

ap0n commented 7 years ago

I tried the newly generated code and it seems fine. But I now see two new problems (!) at the infrastructure. The first one is a StackOverflowError (line ~250 of the attached log) and the second an IndexOutOfBoundsException (line ~2450)

I suppose the StackOverflowError is probably related to the circles of the pipeline (?)

eichelbe commented 7 years ago

Probably right for StackOverflow - cycle detection was omitted there :( Added as simple one and commited. IndexOutOfBoundsException is close to the expected location, but the code indicated by the exception was already changed so something does not seem to be up to date...

ap0n commented 7 years ago

I updated everything and the pipeline started. However, I have this problem I don't know how to resolve. Any ideas would be much appreciated...

In the TimeTravelPip I have a request-response type of communication between the tasks; each request includes the "client's" task id so that the "server" knows where to reply (using direct grouping). This works fine if there are no failures.

When I try to run the QM pipeline though, the workers are frequently restarting, destroying the aforementioned communication because a respawned task has different task-id than the original.

Of course we must solve the problem of restarting (still reviewing the logs) but if there is any idea on how to fix this problem it would help a lot.

eichelbe commented 7 years ago

Indeed, terminating workers is a well-known but not really good documented problem. Could you please detail a bit the steps that you did so far, the alternatives you already excluded. Just as a guess, switching the Infrastructure to events for signals rather than Curator/Zookeeper could be worth a trial, even if it is just for excluding problems.

ap0n commented 7 years ago

For the time being I'm reviewing logs and trying to understand what the problem is; all I see however is netty problems (indicating dying workers I guess). I can't think of any alternatives since I don't know what the actual problem is... I also tried to run the pipeline with a small dataset in order to rule out load-related problems, but the behavior is the same. I guess switching would be just for debugging, right?

What troubles me the most is that the time travel pipeline won't work with restarting workers. Its functionality heavily relies on the state that the workers keep.

eichelbe commented 7 years ago

I see, also the Monitoring Layer does not really like restarting workers... ;)

As far as I know the Storm code, a worker is either dying due to an exception or due to a timeout. However, Storm does not tell us which of its timers is failing (adaptive storm does in the mean time but the information is limited.)

One way I could image is to run your pipeline on the LUH cluster. Our plan is to install adaptive storm there. As the LUH cluster does not have NFS, doing these changes is a bit tricky, also to use simulated data. Cui has now a data source based on yours that can help in this case. But we still need to learn from Miroslav where and how to copy without interfering with other tasks there. Then one trial could be to run it there and to see if/which timer is failing.

Right, switching to the events would be only for debugging and it would only help if the Zookeepers are the problem.

eichelbe commented 7 years ago

And if it is a netty connection problem, at least on our side it was due to another worker failing which shall host the netty server...

ap0n commented 7 years ago

Our simulator reads the simulated data from the grnet hdfs, NFS should be a problem... So, I'll move on to the next task until LUH cluster is ready (do we have any estimation on this?)...

eichelbe commented 7 years ago

Next days... some modifications in adaptive storm deferred that. Hopefully with the changed source even a missing NFS should not be a problem but needs manual copying...

eichelbe commented 7 years ago

Just to let you know that the LUH cluster as now up and running. LUH will next do a test on the "other" pipelines...

ap0n commented 7 years ago

Thanks for the update! I'll check it out later today... (I'm also watching issue #62)

eichelbe commented 7 years ago

Does the financial data source (spring-simulator) require HDFS (I think I have this in mind)? If yes, can we extend it to alternatively read from the file system [Cui did first steps there but I think it makes sense to have it in the same code to avoid confusion while testing/experiments/demo]?

ap0n commented 7 years ago

Yes it does. But I think Nick has also supported local file system through option flags. I don't know if the local fs option is tested but it certainly is hardcoded to HDFS right now...

If the Okeanos HDFS is not an option, I can test the local fs code.

eichelbe commented 7 years ago

Great, this would help a lot!

Local FS could be just a fallback, the location could be taken from the DMS configuration class (dfs.path) or given via a VM switch through the worker configuration. However it fits your work without destroying too much :)

eichelbe commented 7 years ago

It seems that there is HDFS accessible. Thanks for helping us with local fs. Then we can try it as it is. Christoph will start testing.

ap0n commented 7 years ago

In any case, spring client simulator will try to connect Okeanos HDFS (see the code here -- line 198) and try to get data from /user/storm/ path... These things are hardcoded. If necessary we can export them as parameters...

eichelbe commented 7 years ago

There are some basic settings in DML configuration that could help to parameterize that class, e.g., the base HDFS URL (for the pipeline settings). We/you can also add some further ones ;)

ap0n commented 7 years ago

I think we need two more settings.

For the hdfs path I think it's safe to use hdfs.path setting. What do you think?

eichelbe commented 7 years ago

Ok for me if it fits your needs. I've added the two settings to the DML cfg and they shall be transferred from the infrastructure to the workers/algorithms.

@L3SQualimaster this may also help avoiding code replacements for the Twitter source