datadeps: Optimize using task results via deferrals

This PR improves Datadeps to be capable of properly "deferring" a task for later scheduling when that task depends on the result of another task that was spawned within the same Datadeps region. For example, this PR should provide a nice optimization for the following style of code:

fetch(Dagger.spawn_datadeps() do
    A = Dagger.@spawn zeros(4096, 2)
    Dagger.@spawn rand!(Out(@view A[:, 1]))
    Dagger.@spawn rand!(Out(@view A[:, 2]))

    B = Dagger.@spawn rand(128)

    s = Dagger.@spawn sum(A)
    t  = Dagger.@spawn sum(B)
    Dagger.@spawn s + t
end)

Previously, the above code would stall the Datadeps scheduler in the rand! calls until A finished executing, and then stall again when computing s and t, reducing the effective parallelism significantly. With this PR, the above code will no longer stall the scheduler, and will ensure that any task that uses a DTask as input is either ready to run (all such inputs are ready), or will be deferred for later handling.

JuliaParallel / Dagger.jl

datadeps: Optimize using task results via deferrals #567