galasa-dev / projectmanagement

Project Management repo for Issues and ZenHub
7 stars 4 forks source link

local dss.properties file (theoretical) race condition #1371

Open techcobweb opened 1 year ago

techcobweb commented 1 year ago

Background

Run several galasa tests in separate JVMs in parallel.

One test looks at dss values and sees

dss.framework.metrics.runs.local=14
dss.framework.request.prefix.U.lastused=14

So it allocates a runid of U11, and updates the dss properties so lastused figure to 11 so that the next test to allocate a runId doesn't clash with it's number.

During the test run, several other properties get stored in the dss. eg:

dss.framework.run.U15.requestor=unknown
dss.framework.run.U15.group=none
dss.framework.run.U15.test=dev.galasa.example.banking.payee/dev.galasa.example.banking.payee.TestPayeeExtended
dss.framework.run.U15.heartbeat=2023-02-17T10\:09\:08.808622Z
dss.framework.run.U15.rasrunid=local-L1VzZXJzL21jb2JiZXR0Ly5nYWxhc2EvcmFzL1UxNQ\=\=
dss.framework.run.U15.request.type=LOCAL
dss.framework.run.U15.local=true
dss.framework.run.U15.testclass=dev.galasa.example.banking.payee.TestPayeeExtended
dss.framework.run.U15.testbundle=dev.galasa.example.banking.payee
dss.framework.run.U15.status=finished
dss.framework.run.U15.finished=2023-02-17T10\:09\:08.922344Z
dss.framework.run.U15.started=2023-02-17T10\:09\:08.810398Z
dss.framework.run.U15.queued=2023-02-17T10\:09\:08.657749Z
dss.framework.run.U15.result=Passed

in addition to the standard values:

dss.framework.request.prefix.U.lastused=15
dss.framework.metrics.runs.local=15

At the end of the test run, the test attempts to revert the dss.properties to what it should be, removing all those values ralating to the run just passed.

So the entire file is over-written with

dss.framework.request.prefix.U.lastused=15
dss.framework.metrics.runs.local=15

(which is obviously missing all the details about run 15)

The trouble is when there are two tests running in parallel, the other test will try to do the same. The allocation of the test runid seems to be fairly atomic, but one test tries to restore the file to what it thinks the properties should now be, and the other test is in a race condition to restore the properties file to what it thinks the contents should be.

When one test runs early, takes longer than the 2nd test, then it gets to write the file last, clobbering the values that the faster second-to-launch test takes.

Sounds like some cache of the dss.properties is being held until the end of the test and blindly written-out, rather than assuming the file has changed, reading the contents back in, modifying what needs modifying and writing it out, while locking the local file in the process.

I guess that the eclipse tests must be hitting this problem, but it's just never been noticed.

Tasks

techcobweb commented 1 year ago

The code as-is works, though there is a theoretical timing window where it may not work, and we'd see a failure, which would be timing-dependent. On the basis that it's loads better now than it was, I think we keep this item open and move it to the backlog again for now.