PrairieLearn / PrairieLearn

Online problem-driving learning system
http://prairielearn.readthedocs.io/
Other
341 stars 317 forks source link

Invalid workspace launch state machine state #8637

Open mwest1066 opened 9 months ago

mwest1066 commented 9 months ago

While debugging for #8630 we noticed that workspaces could try and start a container even though the workspace directory was never created. See https://us.prairielearn.com/pl/workspace/1234025/logs/version/1

The very first log is “Initialization complete”, which occurs after we’ve moved the directory into place, but this directory was never made.

nwalters512 commented 8 months ago

I stared at the relevant code for quite some time and couldn't for the life of my figure out why we didn't end up with a directory on disk. Here's what we know.

We got a workspace log with the message Initialization complete and the state stopped, which can only have come from here:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L230-L234

Because that code ran, this conditional must have been true:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L197

Because that conditional was true, then state === 'uninitialized' must be true:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L195-L196

Because state === 'uninitialized', then we must have run this block of code:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L170-L173

module.exports.initialize will unconditionally return a non-null value, which means that this conditional would have evaluated to true:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L198

Given that, I can't see any possible way the following code (which moves the directory into place) wouldn't have executed:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L223-L228

The only way the conditional on line 198 could have evaluated to true without the move(...) running would be if this threw an error:

https://github.com/PrairieLearn/PrairieLearn/blob/b961982f78fd324b0159096007d4ccb5a5f192b0/apps/prairielearn/src/lib/workspace.js#L219

But if that threw an error, we wouldn't have the log with the state transition to stopped!

I'm pretty stumped. I hate this answer, but maybe we should chalk this up to a transient failure in EFS?