elastacloud / mbrace-on-brisk-starter

Contains a set of scripts and demos to get you up and running with MBrace on Brisk.
6 stars 6 forks source link

Add CloudRef/CloudArray and CloudFile samples #15

Closed dsyme closed 9 years ago

dsyme commented 9 years ago

Added a sample to upload data as blobs using CloudRef and CloudArray

dsyme commented 9 years ago

I added a CloudFile sample as well. @palladin and @eiriktsarpalis please take a look - I think some of the MBrace API is changing in this regard.

BTW please advise about how to teach people about partitioning cloud arrays.

palladin commented 9 years ago

The best way to read data from a cloud file is https://github.com/elastacloud/mbrace-on-brisk-starter/blob/master/src/Demos/4-parallel-web-download.fsx#L48

isaacabraham commented 9 years ago

Is there any reason to use .NET Task rather than the mbrace Process? Process gives you all information around the process including running time etc. rather than pushing it into a Task where you lose a lot of the details.

eiriktsarpalis commented 9 years ago

Looks good. The brisk version seems to be lacking the client API for creating cloud refs without sending work to the cluster. This will be fixed as soon as push the MBrace.Azure package.

Another example I can think of is giving an example with ICloudDisposable. For instance,

cloud {
    use! data = CloudRef.New [| 1 .. 10000000 |]
    let! results = Array.init 10 (fun i -> doWork i data) |> Cloud.Parallel
    return (data, results) // return the data cloud ref to verify it has been disposed from store
}
dsyme commented 9 years ago

I'm just using Task because that's what I've got used to. I'll try Process and then adjust systematically

eiriktsarpalis commented 9 years ago

@isaacabraham @dsyme The latest release of MBrace.Core comes with a cloud task abstraction. It's essentially like the process type but can be created and consumed by cloud workflows. It should replace any occurrence of System.Threading.Task in the client API.

isaacabraham commented 9 years ago

@eiriktsarpalis: When would you use CloudTask over Cloud? @dsyme: cool - I think moving to Process will be better insofar as provide a simpler experience for reasoning about what's happening to a job that is submitted.

dsyme commented 9 years ago

@eiriktsarpalis - OK, great. I'll switch to CreateProcess for now.

@isaacabraham - any ETA when we can upgrade the Brisk cluster creation to use a newer MBrace.Core? It looks like there are lots of good improvements to the programming model in the works.

dsyme commented 9 years ago

@eiriktsarpalis - what is the "Run" method? cluster.RunAsTask returning an ICloudTask?

dsyme commented 9 years ago

@isaacabraham - actually I'll leave it as RunAsTask for until @eiriktsarpalis advises which is the most stable, preferred option in the light of the upcoming API improvements.

isaacabraham commented 9 years ago

sounds like a plan.

@eiriktsarpalis: is there anywhere (aside from your summary on the google group) where there's a slightly lower-level view of upcoming features / changes?

isaacabraham commented 9 years ago

@eiriktsarpalis: we can do it whenever we want. I'm going to look to expand the roles offered anyway to more than just medium workers so we can do it at the same time. If you can label the appropriate repo (or let me know which nuget packages to get) then I can build it no prob. Also - are we working off the lib folder approach or moving to nuget packages?

eiriktsarpalis commented 9 years ago

@isaacabraham A CloudTask<_> denotes an executing computation in the cluster. Cloud<_> is just a deferred workflow that can be executed arbitrary times. Pretty similar to the differences between Task<_> and Async<_>. Unfortunately there is currently no outline of the programming model features, implemented or planned. This must be addressed soon.

@dsyme We're still working in incorporating the latest core version with MBrace.Azure. Most of the work is done, we're a few failing unit tests away from completion. A CloudTask can be started either from the cluster client or using the Cloud.StartAsCloudTask : Cloud<'T> -> Cloud<ICloudTask<'T>> primitive within a cloud workflow. In the brisk bits, this can be somehow achieved using the Cloud.StartChild : Cloud<'T> -> Cloud<Cloud<'T>> primitive, but this is not quite as flexible.

dsyme commented 9 years ago

@eiriktsarpalis OK, cool thanks for the update, that sounds great (I do wonder if ICloudTask should be CloudTask since it's strange to have the "I" suffix appear in the programming like that for the first time)

In this context we were asking about the best "Run" method to standardize on when scripting on the local machine - CreateProcess, RunAsTask, RunAsync. It looks like we're using CreateProcess for now :)

isaacabraham commented 9 years ago

@dsyme unless you just want to get the result back immediately - then just call Run :-)

eiriktsarpalis commented 9 years ago

@dsyme I would expect that Process<_> is going to implement the ICloudTask<_> interface, so CreateProcess and RunAsTask should merge. Run is probably the best choice for a quick hello world-like demo, but in practice CreateProcess is the best choice when expecting to do some sort of debugging.

isaacabraham commented 9 years ago

That's how I've been using Run() - for demos in .fsx files. For a production system you would undoubtedly want to have some job monitoring involved.