Closed nuwang closed 9 years ago
The speed improvement comes only in case where multiple transient file systems are used. For a single FS, the extraction time seems to be the same. Yet, the monitor thread locks during the extraction and I'm not sure that's a great side-effect. How about making this optional and set as part of instance user data instead? Or staggering the threads so they're not hitting the disk at the same time? Conceptually, I just feel that the monitor thread should not be locked for minutes at a time. Having said that, it's not like the user can do anything else with the system until this completes. I guess that puts me at 0 on this PR.
I do agree with your concerns about the monitor thread being blocked. I guess the thing here is that we actually need a more comprehensive way to dispatch and serialise background tasks - maybe using celery? So my thinking is that it doesn't seem worthwhile spending a lot of effort on this at the moment since whatever we do will be temporary anyway and doesn't really adversely affect the system at the moment. Adding a switch would mean that some people may experience a certain behaviour, while others may not - which would arguably be a worse situation to be in? (better for everyone to fail uniformly so we can fix it if we have to). What do you think?
Also made archive extraction synchronous (two threads hitting a spinning disk is making overall throughput lower - cuts down extraction time from 15-20 minutes for indices + data to around 7 minutes)