dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
899 stars 256 forks source link

Integration of CmdStanPy api with dask #837

Open bparbhu opened 3 years ago

bparbhu commented 3 years ago

Hi all,

In the Stan weekly community meeting we were discussing a Dask like solution for CmdStan and it occurred to me that it might be worth integrating the api for CmdStanPy into Dask. I was wondering what it would take to do this and also what would be needed in terms of resources and also figuring out of how multi-threading or gpu computations would work in a local way or a distributed fashion on a cluster or cloud based service with regards to Dask and Stan. I'm also more than happy to work on this and also help others with this.

Thanks again,

-Brian

stsievert commented 3 years ago

I was wondering what it would take to do this and also what would be needed in terms of resources and also figuring out of how multi-threading or gpu computations would work in a local way or a distributed fashion on a cluster or cloud based service with regards to Dask and Stan.

I've glanced at the CmdStanPy API, and see a lot of API around files. I suspect that will require some work (helped by the fact that the files are specific to each model?).

This might be an appropriate package for the new dask-contrib organization, the docs of which are being worked on in https://github.com/dask/dask/pull/7354.

cc @jrbourbeau

mitzimorris commented 3 years ago

right, files are a big problem - happy to discuss possible DASK integrations

bparbhu commented 3 years ago

I moved this to here for now https://github.com/bparbhu/hpc-stan

I think this would be a good step in terms of what I think would need to be done.

https://github.com/bparbhu?tab=projects

bparbhu commented 1 year ago

Update I just started working on a prototype of what this would be here https://github.com/bparbhu/hpc-stan I've noticed that other stan users are using SLURM as a way to run a stan program on a cluster. So far I'm using dask-jobque to make sure a Stan program can run on a variety of HPC cluster types using Dask.

bparbhu commented 11 months ago

Also work is underway with cloud-stan to address this https://github.com/bparbhu/cloud-stan/tree/main