Potential spinlock when model decode executed in parallel

There's an open issue in a downstream repo which is running into what appears to be a spinlock when autoencoder.py:88:decode() (or something further down the call stack) is executed in parallel.

I have limited understanding of SD/torch, so unfortunately I'm not in a position to dig much deeper. However, from the testing of the vladmandic/automatic repo/community, it was narrowed down to being a timing issue resulting in the decoder being called in parallel, which triggers the issue.

The issue can be worked around by placing a threading.Lock() on the decode block, but this is certainly not a good solution, hence this issue being opened to gain further insight from engineers more familiar with the library.

Using the SD webui project linked above as a specific example:

Ensure Full Live Previews are enabled.
Ensure that live previews n-steps is set sufficiently low (eg 1).
Ensure that the progress bar update period is set sufficiently low (eg, 250ms or lower)

In this scenario, what will frequently happen is that image generation will get stuck in a livelock/spinlock (GPU compute/VRAM continues to be pegged) if the live preview's call to the model's decode begins before the normal model's call path has returned. See this comment for an example. This persists until the process is killed.

Stability-AI / stablediffusion

Potential spinlock when model decode executed in parallel #245