airbreather / StepperUpper

Some tools for making STEP happen.
MIT License
9 stars 2 forks source link

Start Tasks As Soon As Input Files Are Validated #8

Open airbreather opened 7 years ago

airbreather commented 7 years ago

Use Case

While we're checking to make sure that the user has all the files required to perform the actual setup, there's no fundamental reason why we can't just start extracting an archive file to the proper location as soon as we detect that it's available and that it hash matches the expectation. The CPU is probably mostly idle at the time anyway, and especially at the end of the file checking, even more resources are sitting idle.

If this happened at all, it would probably have to be opt-in, though. The point of the checking is that it might fail, and so the setup process might not be able to complete successfully.

I suppose part of this could be improved by adding cancellation support to the tasks so that once someone detects that a required file is missing, they can cancel the rest of the setup tasks and just leave the rest of the file checks to run to completion.

Proposed Solution

This would be really complicated to do the way things stand right now. Maybe associate each expected Md5Checksum with a TaskCompletionSource<FileInfo> "box" and then kick off the main tasks, allowing each one to wait on the "box(es)" that it relies on. Maybe we also set the left-over "boxes", if any, to an "errored" or "null" state once all files are done being verified, so that the tasks don't have to wait forever.

This solution would also kinda benefit from running tasks themselves from multiple "packs" at once, with cross-XML-file "WaitFor" attributes.

airbreather commented 7 years ago

On its own, this isn't the most useful thing in the world as noted above.

With #25, this can significantly cut down on the elapsed time it takes for a new setup: if you have very many files to download, then the flow can go like this:

  1. Run StepperUpper; 160 or however many files need downloading. Independent tasks like plugin cleaning begin and likely complete in a few seconds.
  2. While those are running, the user begins downloading the files one-by-one.
  3. Per #25, as each download completes, dependent tasks unlock. Very likely, the kernel still has the downloaded file cached in memory, so extracting archives will probably be quicker than usual.
    1. This even helps to diminish the impact of long-running dependent tasks like SMIM or the STEP Compilation, because if we order the links from "largest file to smallest file", the user can start downloading the large files earlier, which means they'll be done earlier, which means we'll be able to start processing them earlier.
    2. Starting the large files earlier can be huge, because the user can download (and we can subsequently process) the large files at the same time that the user is manually hunting for the increasingly smaller files that take less time to process.