PSLCL / testing-framework

Panasonic Distributed Testing Framework
Other
1 stars 1 forks source link

DeployHandler failures when executing tests #166

Open amauchley opened 7 years ago

amauchley commented 7 years ago

This issue may be related the to duplicate dependency issue during test execution setup (#149). Various test executions are failing during initial setup, and all the exceptions in one way or another are affected by the DeployHandler failing. These are the two messages brought up in the log file, these are two different test executions, but appear to be affected by the DeployHandler:

1.

05:14:30.877 INFO [SessionCallBackSchedulerThread-1] c.p.d.runner.process.RunnerMachine - Starting test run 13 05:14:48.963 WARN [RunEntryTask] c.p.dtf.runner.process.RunEntryCore - Failed to execute test run 13 - java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed: java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed: at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.pslcl.dtf.runner.template.DeployHandler.waitComplete(DeployHandler.java:211) at com.pslcl.dtf.runner.template.DeployHandler.proceed(DeployHandler.java:178) at com.pslcl.dtf.runner.template.InstancedTemplate.runSteps(InstancedTemplate.java:632) at com.pslcl.dtf.runner.template.InstancedTemplate.(InstancedTemplate.java:104) at com.pslcl.dtf.runner.template.TemplateProvider.getInstancedTemplate(TemplateProvider.java:272) at com.pslcl.dtf.runner.process.RunEntryCore.testRun(RunEntryCore.java:434) at com.pslcl.dtf.runner.process.Action$3.act(Action.java:83) at com.pslcl.dtf.runner.process.RunEntryTask.run(RunEntryTask.java:81) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: Staf requested process failed: at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.executeStafProcess(DeployFuture.java:136) at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.call(DeployFuture.java:116) at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.call(DeployFuture.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 common frames omitted 05:14:48.983 WARN [RunEntryTask] com.pslcl.dtf.runner.process.Action - Action.TESTRUN() sees testRun() exception: java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed:

2.

05:15:09.127 WARN [RunEntryTask] c.p.dtf.runner.process.RunEntryCore - Failed to execute test run 4 - java.lang.Exception: InstancedTemplate.runSteps() deploy handling has incomplete successful deployed list, for setID 1 java.lang.Exception: InstancedTemplate.runSteps() deploy handling has incomplete successful deployed list, for setID 1 at com.pslcl.dtf.runner.template.InstancedTemplate.runSteps(InstancedTemplate.java:640) at com.pslcl.dtf.runner.template.InstancedTemplate.(InstancedTemplate.java:104) at com.pslcl.dtf.runner.template.TemplateProvider.getInstancedTemplate(TemplateProvider.java:272) at com.pslcl.dtf.runner.process.RunEntryCore.testRun(RunEntryCore.java:434) at com.pslcl.dtf.runner.process.Action$3.act(Action.java:83) at com.pslcl.dtf.runner.process.RunEntryTask.run(RunEntryTask.java:81) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 05:15:09.140 WARN [RunEntryTask] com.pslcl.dtf.runner.process.Action - Action.TESTRUN() sees testRun() exception: java.lang.Exception: InstancedTemplate.runSteps() deploy handling has incomplete successful deployed list, for setID 1

jwclark commented 7 years ago

Are there any logs from the test instance itself?

amauchley commented 7 years ago

There are not, nothing comes up in the stderr or stdout logs captured. I believe this error happens before the test(s) are executed on the test instance machine.

amauchley commented 7 years ago

Here is the debug log for a template that failed with the above error:

19:17:16.357 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler waitComplete(), deploy failed future.get() with computed msg: Staf requested process failed: ; original msg: java.lang.Exception: Staf requested process failed: 19:17:16.357 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate runSteps() errors out for templateID �Η�0굦#��x�D#�ZtIَuS+,�k?�^U�[, templateInstanceID 45 19:17:16.357 DEBUG [RunEntryTask] c.p.d.r.template.TemplateProvider - TemplateProvider releaseTemplate() removes template �Η�0굦#��x�D#�ZtIَuS+,�k?�^U�[, from templateReleaseMap, of templateInstanceID 45 19:17:16.357 DEBUG [RunEntryTask] c.p.d.r.template.TemplateProvider - TemplateProvider removeOneEntry_reusableTemplateList() finds 0 stored reusableITs, for templateHash: �Η�0굦#��x�D#�ZtIَuS+,�k?�^U�[ 19:17:16.357 DEBUG [RunEntryTask] c.p.d.r.template.TemplateProvider - TemplateProvider releaseTemplate() destroys template, for templateID �Η�0굦#��x�D#�ZtIَuS+,�k?�^U�[, templateInstanceID 45 19:17:16.364 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate connectHandler() completes 1 connect(s) for setID 1 19:17:16.737 DEBUG [RunEntryTask] ConnectHandler - .disconnect() successfully disconects one machineInstance 19:17:16.737 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate informResourceProviders() about to inform the resource provider system that template �Η�0굦#��x�D#�ZtIَuS+,�k?�^U�[, templateInstanceID 45 no longer needs its reserved or boun$ 19:17:16.737 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate informResourceProviders() informed the resource provider system that template �Η�0굦#��x�D#�ZtIَuS+,�k?�^U�[, templateInstanceID 45 no longer needs its reserved or boun$ 19:17:16.737 DEBUG [RunEntryTask] c.p.dtf.runner.process.RunEntryCore - RunEntryCore testRun errors out, reNum : 2658 19:17:16.738 DEBUG [ReleaseNetworkFuture] c.p.d.r.a.p.n.ReleaseNetworkFuture - Releasing resource start: resourceId: 92 manager: com.pslcl.dtf.resource.aws.AwsResourcesManager provider: com.pslcl.dtf.resource.aws.provider.network.AwsNetworkProvider templateId: 0xefbfbdce97efbfbd30eab5a623efbfbdefbfbd78efbfbd4423efbfbd5a7449d9 templateInstanceId: 45 runId: 2658

19:17:16.738 WARN [RunEntryTask] c.p.dtf.runner.process.RunEntryCore - Failed to execute test run 2658 - java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed: java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed: at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.pslcl.dtf.runner.template.DeployHandler.waitComplete(DeployHandler.java:211) at com.pslcl.dtf.runner.template.DeployHandler.proceed(DeployHandler.java:178) at com.pslcl.dtf.runner.template.InstancedTemplate.runSteps(InstancedTemplate.java:632) at com.pslcl.dtf.runner.template.InstancedTemplate.(InstancedTemplate.java:104) at com.pslcl.dtf.runner.template.TemplateProvider.getInstancedTemplate(TemplateProvider.java:272) at com.pslcl.dtf.runner.process.RunEntryCore.testRun(RunEntryCore.java:434) at com.pslcl.dtf.runner.process.Action$3.act(Action.java:83) at com.pslcl.dtf.runner.process.RunEntryTask.run(RunEntryTask.java:81) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: Staf requested process failed: at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.executeStafProcess(DeployFuture.java:136) at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.call(DeployFuture.java:116) at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.call(DeployFuture.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 common frames omitted 19:17:16.738 DEBUG [CancelTask] c.p.dtf.runner.process.CancelTask - CancelTask TERMINATES run-cancel checking for reNum 2658

amauchley commented 7 years ago

Here is the steps for template 45:

0 bind machine platform=x86_64-amzn1-linux-gnu&pslcl.dtf.resource.machine.os=linux 0 bind machine pslcl.dtf.resource.machine.os=linux 0 bind network 1 connect 0 2 1 connect 1 2 1 deploy 0 bin/BroadcastTest BAC7FAD165D63174018E2238A04860330F214642FDCC4747CB322ABA636404DD 1 deploy 0 cst-iot-test-testing-iot-service-1.0/bin/iot-service-1.0.txt 5B058762082906FB688385C39FC28AB8850F668F15BA4000264ABF999E7AF7EF 1 deploy 0 lib/libdof-inet.a ED970F47B618F8612D72FC5BAD491C8BCEFDB456B4909370E0806B90C7B9F28E 1 deploy 0 lib/libdof-inet.so.2 850A212A52FFE10C75B6540F6473612DE4EA78577A70990D9984B043B0BF881F 1 deploy 0 lib/libdof-oal.a 13D836BA63B32FFCF901D15EE9967301F2E8E5BC7491CCD43061494326BB9A5D 1 deploy 0 lib/libdof-oal.so.7 E5EAB69728B2405A6C1FB5BA1D37624CB84E165FFA4FDDEBF6C0D7660776F260 1 deploy 0 lib/libdof-pcr.a F0CF32EBBDADB5D824746A3809FA410E7E2F215B5FDCE52A0F2259D5CB310CB9 1 deploy 0 lib/libdof-pcr.so.7 498A5A23CA864C0D5B2515C3A57724F5903776A5AEE1A0C33C8ACFEA3498A166 1 deploy 0 libexec/BroadcastTest BDEE6663F04471008E07EBE18913672DA6A167AA96B3B055CE92C333B0F6BEDF 1 deploy 1 bin/OperationsTest B629DBD032A06B127A6D20B163499C2E5E986283D5B2130CECCF62FE241BA358 1 deploy 1 cst-iot-test-testing-iot-service-1.0/bin/iot-service-1.0.txt 5B058762082906FB688385C39FC28AB8850F668F15BA4000264ABF999E7AF7EF 1 deploy 1 lib/dof-cipher-sms4-1.0.jar FC9A28E75340EA704F13016E15C9C3735D094190C4F0FDE31773F8825D2FA4C0 1 deploy 1 lib/dof-cipher-twofish-1.0.jar 36E9B0FC3AB333F1A3408491C4B53A87705FE97C5DB3E61229D2E4C3FE77F20A 1 deploy 1 lib/dof-connection-reconnecting-listener-2.0.1-2.0.1.jar 903FB7B998F2C935F1C4305847A88E3DE1BB6417C6B63B4D8EC65C4C7F2DCFD1 1 deploy 1 lib/dof-inet-7.0-7.0.jar 8744698D1C28759407723F15650EB0E036FAB41A00768271DE53334C517358DA 1 deploy 1 lib/dof-oal-7.0-7.0.jar 2EE40E3E85EAB1F28AE2372116BB73DC8E87EE6F2BE138320FFBDEC5761052BF 1 deploy 1 lib/dof-server-restarting-listener-2.0.1-2.0.1.jar FED66E820DC45EDEB0FCF5B5C6B0E868EF998F752FFF8F19A15A27200B1D587C 1 deploy 1 lib/gnu-crypto-2.0.1.jar B041B53FC4DA294947EF081E5FAFF79A4C76EF8556F3A2D3E43104706CAFF3C0 1 deploy 1 lib/testing-dof-oal-7.0.jar BFBBB7131B5E6019830456B0BCF8B0DBCEF236006AD80CEDAB9DCAB14011A058 2 start 1 bin%2FOperationsTest -providerMode -secure -oid %5B3%3A7%40testing.opendof.org%5D -address 34.226.79.40 -authid %5B3%3A7%40testing.opendof.org%5D 3 run 0 bin%2FBroadcastTest -secure -maxOID 0 -oidStartNum 7 -maxiterations 2 -address 34.226.79.40 -authid 7%40testing.opendof.org -lateConnect -testget -udp

amauchley commented 7 years ago

some of these deploy handler errors may be due to the logs not deploying correctly to s3, I'm not sure, it appears in the logs that this error may happen after a test has ran and the logs aren't loaded to s3.

amauchley commented 7 years ago

Here is another log output for a failed deployment of dependencies that would be associated to bug #149 as mentioned above.

19:17:10.911 DEBUG [RunEntryTask] ConnectHandler - .disconnect() successfully disconects one machineInstance 19:17:10.912 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate informResourceProviders() about to inform the resource provider system that template ^��KϾ�l��܋��^N�^Z �J��;���e'^]<�^W, templateInstanceID 43 no longer needs its reserved or$ 19:17:10.912 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate informResourceProviders() informed the resource provider system that template ^��KϾ�l��܋��^N�^Z �J��;���e'^]<�^W, templateInstanceID 43 no longer needs its reserved or$ 19:17:10.912 DEBUG [RunEntryTask] c.p.dtf.runner.process.RunEntryCore - RunEntryCore testRun errors out, reNum : 2650 19:17:10.912 DEBUG [ReleaseNetworkFuture] c.p.d.r.a.p.n.ReleaseNetworkFuture - Releasing resource start: resourceId: 88 manager: com.pslcl.dtf.resource.aws.AwsResourcesManager provider: com.pslcl.dtf.resource.aws.provider.network.AwsNetworkProvider templateId: 0x5eefbfbdefbfbd4bcfbeefbfbd6cefbfbdefbfbddc8befbfbdefbfbd0eefbfbd templateInstanceId: 43 runId: 2650

19:17:10.914 DEBUG [AwsReleaseFuture] c.p.d.r.a.p.m.MachineReservedResource - MachineReservedResource.setBindFutureCanceled = true 19:17:10.914 WARN [RunEntryTask] c.p.dtf.runner.process.RunEntryCore - Failed to execute test run 2650 - java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed: mv: cannot stat '067599829562CC1F0A2D061118CF607DF51E51AEB564F73$

java.util.concurrent.ExecutionException: java.lang.Exception: Staf requested process failed: mv: cannot stat '067599829562CC1F0A2D061118CF607DF51E51AEB564F732F5368316696C7C51': No such file or directory

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at com.pslcl.dtf.runner.template.DeployHandler.waitComplete(DeployHandler.java:211)
    at com.pslcl.dtf.runner.template.DeployHandler.proceed(DeployHandler.java:178)
    at com.pslcl.dtf.runner.template.InstancedTemplate.runSteps(InstancedTemplate.java:632)
    at com.pslcl.dtf.runner.template.InstancedTemplate.<init>(InstancedTemplate.java:104)
    at com.pslcl.dtf.runner.template.TemplateProvider.getInstancedTemplate(TemplateProvider.java:272)
    at com.pslcl.dtf.runner.process.RunEntryCore.testRun(RunEntryCore.java:434)
    at com.pslcl.dtf.runner.process.Action$3.act(Action.java:83)
    at com.pslcl.dtf.runner.process.RunEntryTask.run(RunEntryTask.java:81)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.Exception: Staf requested process failed: mv: cannot stat '067599829562CC1F0A2D061118CF607DF51E51AEB564F732F5368316696C7C51': No such file or directory

    at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.executeStafProcess(DeployFuture.java:136)
    at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.call(DeployFuture.java:116)
    at com.pslcl.dtf.core.runner.resource.staf.futures.DeployFuture.call(DeployFuture.java:29)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 common frames omitted

19:17:10.914 DEBUG [AwsReleaseFuture] c.p.d.r.a.p.m.AwsMachineProvider - com.pslcl.dtf.resource.aws.provider.machine.AwsMachineProvider.release templateInstanceId: 43 isReusable: true machine instances being released: AwsMachineInstance sanitizing: false destroyed: false taken: true instantiationTime:9/29/17 6:55 PM MachineReservedResource:

resourceId: 87 manager: com.pslcl.dtf.resource.aws.AwsResourcesManager provider: com.pslcl.dtf.resource.aws.provider.machine.AwsMachineProvider templateId: 0x5eefbfbdefbfbd4bcfbeefbfbd6cefbfbdefbfbddc8befbfbdefbfbd0eefbfbd templateInstanceId: 43 runId: 2650

amauchley commented 7 years ago

Here is runId 2650's template steps:

19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandler Bind future.get()[0] returned at 2017-09-29 19:17:06 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandler bind [0] completes, with attributes of {pslcl.dtf.aws.ec2instance.vpc.id=vpc-e7802f82, pslcl.dtf.aws.ec2instance.iam-arn=arn:aws:iam::406708278891:instance-profile/dtf-cst-iot-ser$ /usr/local/staf/startSTAFProc.sh} 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandler Bind future.get()[1] called at 2017-09-29 19:17:06 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandler Bind future.get()[1] returned at 2017-09-29 19:17:06 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandler bind [1] completes, with attributes of {pslcl.dtf.aws.ec2instance.vpc.id=vpc-e7802f82, pslcl.dtf.aws.ec2instance.subnet.size=16, pslcl.dtf.aws.ec2instance.subnet.name=dtf-10.0.0.0$ 19:17:06.179 DEBUG [RunEntryTask] c.p.d.r.template.InstancedTemplate - InstancedTemplate bindHandler() completes 2 bind(s) for setID 0 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 2, for step: 1 connect 0 1 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 3, for step: 1 deploy 0 bin/ConnectionTest 5F18E62D492B45675CAA506081B8EF75A1A9D0D2DEB4D8D3FCAC4501C912B66A 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 4, for step: 1 deploy 0 cst-iot-test-testing-iot-service-1.3/bin/iot-service-1.3.txt 5B058762082906FB688385C39FC28AB8850F668F15BA4000264ABF999E7AF7EF 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 5, for step: 1 deploy 0 lib/dof-cipher-sms4-1.0.jar FC9A28E75340EA704F13016E15C9C3735D094190C4F0FDE31773F8825D2FA4C0 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 6, for step: 1 deploy 0 lib/dof-cipher-twofish-1.0.jar 36E9B0FC3AB333F1A3408491C4B53A87705FE97C5DB3E61229D2E4C3FE77F20A 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 7, for step: 1 deploy 0 lib/dof-connection-reconnecting-listener-2.0.1-2.0.1.jar 903FB7B998F2C935F1C4305847A88E3DE1BB6417C6B63B4D8EC65C4C7F2DCFD1 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 8, for step: 1 deploy 0 lib/dof-inet-7.0.2.jar 22F8A9DFB66F7B834EA83C1364EC0F707A6158E75F494B6E2EE751354B2311EF 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 9, for step: 1 deploy 0 lib/dof-oal-7.0.2.jar BBE15485018103C311976F265CC4D29D9D87636CAC3C7ABB87D9403661C12B8D 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 10, for step: 1 deploy 0 lib/dof-server-restarting-listener-2.0.1-2.0.1.jar FED66E820DC45EDEB0FCF5B5C6B0E868EF998F752FFF8F19A15A27200B1D587C 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 11, for step: 1 deploy 0 lib/gnu-crypto-2.0.1.jar B041B53FC4DA294947EF081E5FAFF79A4C76EF8556F3A2D3E43104706CAFF3C0 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 12, for step: 1 deploy 0 lib/testing-dof-oal-7.0.2.jar 716CEF62E190C91194B2E49D21B1683410FF67A936532340CF24448DD9DDCAD1 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 13, for step: 1 deploy 0 lib/testing-dof-oal-7.0.2.jar.dep 067599829562CC1F0A2D061118CF607DF51E51AEB564F732F5368316696C7C51 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.StepsParser - StepsParser returns stepReference 14, for step: 1 deploy 0 lib/testing-dof-oal-7.0.2.jar.dep 067599829562CC1F0A2D061118CF607DF51E51AEB564F732F5368316696C7C51 19:17:06.179 DEBUG [RunEntryTask] c.p.d.r.template.ConnectHandler - ConnectHandler computeConnectRequests() finds connect in stepSet 1: 1 connect 0 1 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 bin/ConnectionTest 5F18E62D492B45675CAA506081B8EF75A1A9D0D2DEB4D8D3FCAC4501C912B66A 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 cst-iot-test-testing-iot-service-1.3/bin/iot-service-1.3.txt 5B058762082906FB688385C39FC28AB8850F668F15BA4000264ABF$ 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/dof-cipher-sms4-1.0.jar FC9A28E75340EA704F13016E15C9C3735D094190C4F0FDE31773F8825D2FA4C0 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/dof-cipher-twofish-1.0.jar 36E9B0FC3AB333F1A3408491C4B53A87705FE97C5DB3E61229D2E4C3FE77F20A 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/dof-connection-reconnecting-listener-2.0.1-2.0.1.jar 903FB7B998F2C935F1C4305847A88E3DE1BB6417C6B63B4D8EC65C4C7F$ 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/dof-inet-7.0.2.jar 22F8A9DFB66F7B834EA83C1364EC0F707A6158E75F494B6E2EE751354B2311EF 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/dof-oal-7.0.2.jar BBE15485018103C311976F265CC4D29D9D87636CAC3C7ABB87D9403661C12B8D 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/dof-server-restarting-listener-2.0.1-2.0.1.jar FED66E820DC45EDEB0FCF5B5C6B0E868EF998F752FFF8F19A15A27200B1D587C 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/gnu-crypto-2.0.1.jar B041B53FC4DA294947EF081E5FAFF79A4C76EF8556F3A2D3E43104706CAFF3C0 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/testing-dof-oal-7.0.2.jar 716CEF62E190C91194B2E49D21B1683410FF67A936532340CF24448DD9DDCAD1 19:17:06.179 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/testing-dof-oal-7.0.2.jar.dep 067599829562CC1F0A2D061118CF607DF51E51AEB564F732F5368316696C7C51 19:17:06.180 DEBUG [RunEntryTask] c.p.d.runner.template.DeployHandler - DeployHandler computeDeployRequests() finds deploy in stepSet 1: 1 deploy 0 lib/testing-dof-oal-7.0.2.jar.dep 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandl$ 19:17:06.179 DEBUG [RunEntryTask] c.p.dtf.runner.template.BindHandler - BindHandler bind [0] completes, with attributes of {pslcl.dtf.aws.ec2instance.vpc.id=vpc-e7802f82, pslcl.dtf.aws.ec2instance.iam-arn=arn:aws:iam::406708278891:instance-profile/dtf-cst-iot-ser$ /usr/local/staf/startSTAFProc.sh}

jwclark commented 7 years ago

1 deploy 1 cst-iot-test-testing-iot-service-1.0/bin/iot-service-1.0.txt 5B058762082906FB688385C39FC28AB8850F668F15BA4000264ABF999E7AF7EF 1 deploy 0 cst-iot-test-testing-iot-service-1.0/bin/iot-service-1.0.txt 5B058762082906FB688385C39FC28AB8850F668F15BA4000264ABF999E7AF7EF

Same file but different machines.

Chad-Adams commented 7 years ago

In looking at the first exception shown in the original bug post above, the failure is on a linux 'sudo mv' command which follows a sudo wget command and is followed by a sudo chmod 777 command.

There is current debug level logging that might help indicate if a duplicate DeployFuture.call is being made while already processing a previous identical command. Search the logs for:

StafSupport.issueProcessShellRequest: (.issueProcessPowershellRequest: on windows) and then a rich structured dump of the 'ProcessCommandData' being issued.

Look for cmdData.setCommand("sudo mkdir -p " + cmdData.getBasePath()); cmdData.setCommand("sudo wget -t " + retries + " " + sourceUrl); cmdData.setCommand("sudo mv " + urlFile + " " + cmdData.getFileName()); cmdData.setCommand("sudo chmod 777 " + cmdData.getFileName());

dumps in quick (hopefully ping milliseconds apart) unbroken succession with no intermingling of cmdData being sent to the same host.